lost in solr new core architecture

2014-04-11 Thread Aman Tandon
HI,

Currently i am using solr 4.2 with tomcat right now i am stucked because i
don't know how to upgrade to solr 4.7, because the problem for me is that i
am familiar with the cores architecture of solr 4.2 in which we defined the
every core name as well as instanceDir but not with solr 4.7.
Any help will be appreciated, thanks

With Regards
Aman Tandon


Re: deleting large amount data from solr cloud

2014-04-11 Thread Aman Tandon
Vinay please share your experience after trying this solution.


On Sat, Apr 12, 2014 at 4:12 AM, Vinay Pothnis  wrote:

> The query is something like this:
>
>
> *curl -H 'Content-Type: text/xml' --data 'param1:(val1 OR
> val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO
> 138516480]'
> 'http://host:port/solr/coll-name1/update?commit=true'*
>
> Trying to restrict the number of documents deleted via the date parameter.
>
> Had not tried the "distrib=false" option. I could give that a try. Thanks
> for the link! I will check on the cache sizes and autowarm values. Will try
> and disable the caches when I am deleting and give that a try.
>
> Thanks Erick and Shawn for your inputs!
>
> -Vinay
>
>
>
> On 11 April 2014 15:28, Shawn Heisey  wrote:
>
> > On 4/10/2014 7:25 PM, Vinay Pothnis wrote:
> >
> >> When we tried to delete the data through a query - say 1 day/month's
> worth
> >> of data. But after deleting just 1 month's worth of data, the master
> node
> >> is going out of memory - heap space.
> >>
> >> Wondering is there any way to incrementally delete the data without
> >> affecting the cluster adversely.
> >>
> >
> > I'm curious about the actual query being used here.  Can you share it, or
> > a redacted version of it?  Perhaps there might be a clue there?
> >
> > Is this a fully distributed delete request?  One thing you might try,
> > assuming Solr even supports it, is sending the same delete request
> directly
> > to each shard core with distrib=false.
> >
> > Here's a very incomplete list about how you can reduce Solr heap
> > requirements:
> >
> > http://wiki.apache.org/solr/SolrPerformanceProblems#
> > Reducing_heap_requirements
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
With Regards
Aman Tandon


Re: [ANN] Solr learning resources on safariflow.com (w/subscription or free trial)

2014-04-11 Thread Alexandre Rafalovitch
Looks nice. Would love to see the author-side usage/statistics too. To know
which chapters of my book were most useful/recommended.

Regards,
 Alex
On 11/04/2014 8:45 pm, "Michael Sokolov" 
wrote:

> I just wanted to let people know about some recent Solr books and videos
> that are now available at safariflow.com.  You can sign up for a free
> trial and get instant access, buy a subscription, or you may already be a
> subscriber.  I don't normally send out announcements like this, but because
> we just got an influx of new material, I thought people might be interested.
>
> Solr in Action (March 2014)
> http://www.safariflow.com/library/view/Solr-in-Action/9781617291029/
>
> Einführung in Apache Solr (March 2014)
> http://www.safariflow.com/library/view/Einf%25C3%25BChrung-in-Apache-Solr/
> 9783955614249/
>
> Apache Solr High Performance (March 2014)
> http://www.safariflow.com/library/view/Apache-Solr-High-
> Performance/9781782164821/
>
> Getting Started with Apache Solr Search Server (June 2013 video course):
> http://www.safariflow.com/library/view/Getting-started-
> with-Apache-Solr-Search-Server-%255BVideo%255D/9781782160847/
>
>
>
> In addition these are some other Solr and Lucene titles we have had for a
> little while:
>
>
>
> http://www.safariflow.com/library/view/Instant-Apache-
> Solr-for-Indexing-Data-How-to/9781782164845/
>
> http://www.safariflow.com/library/view/Apache-Solr-3-
> Enterprise-Search-Server/9781849516068/
>
> http://www.safariflow.com/library/view/Lucene-in-Action%
> 252C-Second-Edition/9781933988177/
>
>
>


Re: deleting large amount data from solr cloud

2014-04-11 Thread Vinay Pothnis
The query is something like this:


*curl -H 'Content-Type: text/xml' --data 'param1:(val1 OR
val2) AND -param2:(val3 OR val4) AND date_param:[138395520 TO
138516480]'
'http://host:port/solr/coll-name1/update?commit=true'*

Trying to restrict the number of documents deleted via the date parameter.

Had not tried the "distrib=false" option. I could give that a try. Thanks
for the link! I will check on the cache sizes and autowarm values. Will try
and disable the caches when I am deleting and give that a try.

Thanks Erick and Shawn for your inputs!

-Vinay



On 11 April 2014 15:28, Shawn Heisey  wrote:

> On 4/10/2014 7:25 PM, Vinay Pothnis wrote:
>
>> When we tried to delete the data through a query - say 1 day/month's worth
>> of data. But after deleting just 1 month's worth of data, the master node
>> is going out of memory - heap space.
>>
>> Wondering is there any way to incrementally delete the data without
>> affecting the cluster adversely.
>>
>
> I'm curious about the actual query being used here.  Can you share it, or
> a redacted version of it?  Perhaps there might be a clue there?
>
> Is this a fully distributed delete request?  One thing you might try,
> assuming Solr even supports it, is sending the same delete request directly
> to each shard core with distrib=false.
>
> Here's a very incomplete list about how you can reduce Solr heap
> requirements:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#
> Reducing_heap_requirements
>
> Thanks,
> Shawn
>
>


Re: deleting large amount data from solr cloud

2014-04-11 Thread Shawn Heisey

On 4/10/2014 7:25 PM, Vinay Pothnis wrote:

When we tried to delete the data through a query - say 1 day/month's worth
of data. But after deleting just 1 month's worth of data, the master node
is going out of memory - heap space.

Wondering is there any way to incrementally delete the data without
affecting the cluster adversely.


I'm curious about the actual query being used here.  Can you share it, 
or a redacted version of it?  Perhaps there might be a clue there?


Is this a fully distributed delete request?  One thing you might try, 
assuming Solr even supports it, is sending the same delete request 
directly to each shard core with distrib=false.


Here's a very incomplete list about how you can reduce Solr heap 
requirements:


http://wiki.apache.org/solr/SolrPerformanceProblems#Reducing_heap_requirements

Thanks,
Shawn



Re: deleting large amount data from solr cloud

2014-04-11 Thread Vinay Pothnis
Tried to increase the memory to 24G but that wasn't enough as well.
Agree that the index has now grown too much and had to monitor this and
take action much earlier.

The search operations seem to run ok with 16G - mainly because the bulk of
the data that we are trying to delete is not getting searched. So, now -
basically in a salvage mode.

Does the number of documents deleted at a time have any impact? If I
'trickle delete' - say 50K documents at a time - would that make a
difference?

When i delete, does solr try to bring in all the index to memory? Trying to
understand what happens under the hood.

Thanks
Vinay


On 11 April 2014 13:53, Erick Erickson  wrote:

> Using 16G for a 360G index is probably pushing things. A lot. I'm
> actually a bit surprised that the problem only occurs when you delete
> docs
>
> The simplest thing would be to increase the JVM memory. You should be
> looking at your index to see how big it is, be sure to subtract out
> the *.fdt and *.fdx files, those are used for verbatim copies of the
> raw data and don't really count towards the memory requirements.
>
> I suspect you're just not giving enough memory to your JVM and this is
> just the first OOM you've hit. Look on the Solr admin page and see how
> much is being reported, if it's near the limit of your 16G that's the
> "smoking gun"...
>
> Best,
> Erick
>
> On Fri, Apr 11, 2014 at 7:45 AM, Vinay Pothnis  wrote:
> > Sorry - yes, I meant to say leader.
> > Each JVM has 16G of memory.
> >
> >
> > On 10 April 2014 20:54, Erick Erickson  wrote:
> >
> >> First, there is no "master" node, just leaders and replicas. But that's
> a
> >> nit.
> >>
> >> No real clue why you would be going out of memory. Deleting a
> >> document, even by query should just mark the docs as deleted, a pretty
> >> low-cost operation.
> >>
> >> how much memory are you giving the JVM?
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis 
> wrote:
> >> > [solr version 4.3.1]
> >> >
> >> > Hello,
> >> >
> >> > I have a solr cloud (4 nodes - 2 shards) with a fairly large amount
> >> > documents (~360G of index per shard). Now, a major portion of the
> data is
> >> > not required and I need to delete those documents. I would need to
> delete
> >> > around 75% of the data.
> >> >
> >> > One of the solutions could be to drop the index completely re-index.
> But
> >> > this is not an option at the moment.
> >> >
> >> > When we tried to delete the data through a query - say 1 day/month's
> >> worth
> >> > of data. But after deleting just 1 month's worth of data, the master
> node
> >> > is going out of memory - heap space.
> >> >
> >> > Wondering is there any way to incrementally delete the data without
> >> > affecting the cluster adversely.
> >> >
> >> > Thank!
> >> > Vinay
> >>
>


Re: Solr Admin core status - Index is not "Current"

2014-04-11 Thread Chris W
Thanks, Shawn.


On Fri, Apr 11, 2014 at 11:11 AM, Shawn Heisey  wrote:

> On 4/10/2014 2:50 PM, Chris W wrote:
>
>> Hi there
>>
>>I am using solrcloud (4.3). I am trying to get the status of a core
>> from
>> solr using (localhost:8000/solr/admin/cores?action=STATUS&core=)
>> and
>> i get the following output
>>
>> 100
>> 102
>> 2
>> 20527
>> 20
>> *false*
>>
>>
>> What does current mean? A few of the cores are optimized (with segment
>> count 1) and show current = "true" and rest show current as false.
>>
>> If i have to make the core as current, what should i do? Is it a big alarm
>> if the value is false?
>>
>
> This basically means that Lucene has detected an index state where
> something has made changes to the index, but those changes are not yet
> visible.  To make them visible and return this status to 'true', do a
> commit or soft commit with openSearcher enabled.
>
> http://lucene.apache.org/core/4_7_0/core/org/apache/lucene/
> index/DirectoryReader.html#isCurrent%28%29
>
> Thanks,
> Shawn
>
>


-- 
Best
-- 
C


Strange double-logging with log4j

2014-04-11 Thread Shawn Heisey
This is lucene_solr_4_7_2_r1586229, downloaded from the release 
manager's staging area.


I configured the following in my log4j.properties file:

log4j.rootLogger=WARN, file
log4j.category.org.apache.solr.core.SolrCore=INFO, file

Now EVERYTHING that SolrCore logs (which is all at INFO) is being logged 
twice.


Should I have done this differently, or is there a bug?

I am using a container setup that is almost exactly like the example.  
The slf4j jars have been upgraded to 1.7.6 and jetty's jars have been 
upgraded to 8.1.14.


Thanks,
Shawn



Re: Solr dosn't load index at startup: out of memory

2014-04-11 Thread Erick Erickson
My assumption is that you've been adding documents and just have
finally run out of space


Is that true

Best,
Erick

On Fri, Apr 11, 2014 at 9:31 AM, Rafał Kuć  wrote:
> Hello!
>
> Do you have warming queries defined?
>
> --
> Regards,
>  Rafał Kuć
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>> Hi,
>> my solr (v. 4.5) after moths of work suddenly stopped to index: it responded
>> at the query but didn't index anymore new data. Here the error message:
>> ERROR - 2014-04-11 15:52:30.317; org.apache.solr.common.SolrException;
>> java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot
>> commit
>> at
>> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2788)
>
>> So, I restarted solr using more RAM (from 4GB until 8Gb) but now solr can't
>> load the cores. Here the error message:
>> ERROR - 2014-04-11 16:32:50.509;
>> org.apache.solr.core.CoreContainer; Unable
>> to create core: posts
>> org.apache.solr.common.SolrException: Error Instantiating Update Handler,
>> solr.DirectUpdateHandler2 failed to instantiate
>> org.apache.solr.update.UpdateHandler
>> ...
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>> Anyone can help me?
>
>
>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-dosn-t-load-index-at-startup-out-of-memory-tp4130665.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: deleting large amount data from solr cloud

2014-04-11 Thread Erick Erickson
Using 16G for a 360G index is probably pushing things. A lot. I'm
actually a bit surprised that the problem only occurs when you delete
docs

The simplest thing would be to increase the JVM memory. You should be
looking at your index to see how big it is, be sure to subtract out
the *.fdt and *.fdx files, those are used for verbatim copies of the
raw data and don't really count towards the memory requirements.

I suspect you're just not giving enough memory to your JVM and this is
just the first OOM you've hit. Look on the Solr admin page and see how
much is being reported, if it's near the limit of your 16G that's the
"smoking gun"...

Best,
Erick

On Fri, Apr 11, 2014 at 7:45 AM, Vinay Pothnis  wrote:
> Sorry - yes, I meant to say leader.
> Each JVM has 16G of memory.
>
>
> On 10 April 2014 20:54, Erick Erickson  wrote:
>
>> First, there is no "master" node, just leaders and replicas. But that's a
>> nit.
>>
>> No real clue why you would be going out of memory. Deleting a
>> document, even by query should just mark the docs as deleted, a pretty
>> low-cost operation.
>>
>> how much memory are you giving the JVM?
>>
>> Best,
>> Erick
>>
>> On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis  wrote:
>> > [solr version 4.3.1]
>> >
>> > Hello,
>> >
>> > I have a solr cloud (4 nodes - 2 shards) with a fairly large amount
>> > documents (~360G of index per shard). Now, a major portion of the data is
>> > not required and I need to delete those documents. I would need to delete
>> > around 75% of the data.
>> >
>> > One of the solutions could be to drop the index completely re-index. But
>> > this is not an option at the moment.
>> >
>> > When we tried to delete the data through a query - say 1 day/month's
>> worth
>> > of data. But after deleting just 1 month's worth of data, the master node
>> > is going out of memory - heap space.
>> >
>> > Wondering is there any way to incrementally delete the data without
>> > affecting the cluster adversely.
>> >
>> > Thank!
>> > Vinay
>>


Re: High CPU usage after import

2014-04-11 Thread Erick Erickson
Are you storing the data? That is, the raw binary of the MP3? B/c when
stored="true", Solr will try to compress the data, perhaps that's
what's driving the CPU utilization?

Easy test: set stored="false" for everything..

FWIW,
Erick

On Fri, Apr 11, 2014 at 5:23 AM, Александр Вандышев
 wrote:
> I realized what the problem was. One of the Solr threads freezes when
> importing
> MP3 files. When there are many such files Solr loads all processors. Is
> there a
> way to free thread?
>
> Re: High CPU usage after import That could mean that the code is hung
> somehow.
> Or, maybe Solr is just
> working on the commit. Unless you have an explicit commit, the automatic
> commit will occur some time after the extract request. How much data are we
> talking about?
>
> What does the Solr log say? Compare that to the case where CPU usage does
> settle down.
>
> -- Jack Krupansky
>
> -Original Message-
> From: Александр Вандышев
> Sent: Thursday, April 3, 2014 3:24 AM
> To: Solr User
> Subject: High CPU usage after import
>
> Thanks for the answer. I meant that the CPU does not free after the end of
> import.Tomtcat or Solr continue use it in max level.
>
> .
>
> Вт. 01 апр. 2014 20:09:24 пользователь Jack Krupansky
> (j...@basetechnology.com)
> написал:
>
>
> Some document types can consume significant CPU resources, such as large PDF
> files.
>
> -- Jack Krupansky
>
> -Original Message-
> From: Александр Вандышев
> Sent: Tuesday, April 1, 2014 9:28 AM
> To: Solr User
> Subject: High CPU usage after import
>
> I use a update/extract handler for indexing a large number of files. If
> during
> indexing a CPU loads was not maximum at the end of import loading decreases.
> If
> CPU loading was max then loading remain high. Who can help me?


Re: Solr Admin core status - Index is not "Current"

2014-04-11 Thread Shawn Heisey

On 4/10/2014 2:50 PM, Chris W wrote:

Hi there

   I am using solrcloud (4.3). I am trying to get the status of a core from
solr using (localhost:8000/solr/admin/cores?action=STATUS&core=) and
i get the following output

100
102
2
20527
20
*false*

What does current mean? A few of the cores are optimized (with segment
count 1) and show current = "true" and rest show current as false.

If i have to make the core as current, what should i do? Is it a big alarm
if the value is false?


This basically means that Lucene has detected an index state where 
something has made changes to the index, but those changes are not yet 
visible.  To make them visible and return this status to 'true', do a 
commit or soft commit with openSearcher enabled.


http://lucene.apache.org/core/4_7_0/core/org/apache/lucene/index/DirectoryReader.html#isCurrent%28%29

Thanks,
Shawn



Re: Solr Admin core status - Index is not "Current"

2014-04-11 Thread Chris W
Any help on this is much appreciated. I cannot find any documentation
around this and would be good to understand what this means


Thanks


On Thu, Apr 10, 2014 at 1:50 PM, Chris W  wrote:

> Hi there
>
>   I am using solrcloud (4.3). I am trying to get the status of a core from
> solr using (localhost:8000/solr/admin/cores?action=STATUS&core=) and
> i get the following output
>
> 100
> 102
> 2
> 20527
> 20
> *false*
>
> What does current mean? A few of the cores are optimized (with segment
> count 1) and show current = "true" and rest show current as false.
>
> If i have to make the core as current, what should i do? Is it a big alarm
> if the value is false?
>
> --
> Best
> --
> C
>



-- 
Best
-- 
C


RE: Relevance/Rank

2014-04-11 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
HI thanks Aman/Eric,

I move part of the query under q=*:* and there is a difference in the score and 
the Order. It seems work for me now. I use this and move forward

Thanks

Ravi

-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com] 
Sent: Friday, April 11, 2014 12:02 AM
To: solr-user@lucene.apache.org
Subject: Re: Relevance/Rank

Its fine Erick, I am guessing that maybe* &fq=(SKU:204-161)...  *this SKU with 
that value is present in all results that's why Name products are not getting 
boosted.

Ravi: check your results without filtering, does all the results include 
*SKU:204-161.
*I guess this may help.


On Fri, Apr 11, 2014 at 9:22 AM, Erick Erickson wrote:

> Aman:
>
> Oops, looked at the wrong part of the query, didn't see the bq clause.
> You're right of course. Sorry for the misdirection.
>
> Erick
>



--
With Regards
Aman Tandon


Re: Solr dosn't load index at startup: out of memory

2014-04-11 Thread Rafał Kuć
Hello!

Do you have warming queries defined? 

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> Hi,
> my solr (v. 4.5) after moths of work suddenly stopped to index: it responded
> at the query but didn't index anymore new data. Here the error message:
> ERROR - 2014-04-11 15:52:30.317; org.apache.solr.common.SolrException;
> java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot
> commit
> at
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2788)

> So, I restarted solr using more RAM (from 4GB until 8Gb) but now solr can't
> load the cores. Here the error message:
> ERROR - 2014-04-11 16:32:50.509;
> org.apache.solr.core.CoreContainer; Unable
> to create core: posts
> org.apache.solr.common.SolrException: Error Instantiating Update Handler,
> solr.DirectUpdateHandler2 failed to instantiate
> org.apache.solr.update.UpdateHandler
> ...
> Caused by: java.lang.OutOfMemoryError: Java heap space

> Anyone can help me?



> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-dosn-t-load-index-at-startup-out-of-memory-tp4130665.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Search a list of words and returned order

2014-04-11 Thread Jack Krupansky
Generally, the documents containing more of the terms should score higher 
and be returned first, but "relevancy" for some terms can skew that 
ordering, to some degree. What specific use cases are failing for you?


You can always add an additional optional subquery which is the AND of all 
terms and has a significant boost:


q=see spot run (+see +spot +run)^10

-- Jack Krupansky

-Original Message- 
From: Croci Francesco Luigi (ID SWS)

Sent: Friday, April 11, 2014 9:47 AM
To: 'solr-user@lucene.apache.org'
Subject: Search a list of words and returned order

When I search  for a list of words, per default Solr uses the OR operator.

In my case I index (pdfs) files. How/what can I do so that when I search the 
index for a list of words, I get the list of documents ordered first by  the 
ones that have all the words in them?


Thank you
Francesco 



Solr dosn't load index at startup: out of memory

2014-04-11 Thread Erik
Hi,
my solr (v. 4.5) after moths of work suddenly stopped to index: it responded
at the query but didn't index anymore new data. Here the error message:
ERROR - 2014-04-11 15:52:30.317; org.apache.solr.common.SolrException;
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot
commit
at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2788)

So, I restarted solr using more RAM (from 4GB until 8Gb) but now solr can't
load the cores. Here the error message:
ERROR - 2014-04-11 16:32:50.509; org.apache.solr.core.CoreContainer; Unable
to create core: posts
org.apache.solr.common.SolrException: Error Instantiating Update Handler,
solr.DirectUpdateHandler2 failed to instantiate
org.apache.solr.update.UpdateHandler
...
Caused by: java.lang.OutOfMemoryError: Java heap space

Anyone can help me?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-dosn-t-load-index-at-startup-out-of-memory-tp4130665.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Class not found ICUFoldingFilter (SOLR-4852)

2014-04-11 Thread Shawn Heisey
On 4/11/2014 3:44 AM, ronak kirit wrote:
> I am facing the same issue discussed at SOLR-4852. I am getting below error:
> 
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.lucene.analysis.icu.ICUFoldingFilter
> at
> org.apache.lucene.analysis.icu.ICUFoldingFilterFactory.create(ICUFoldingFilterFactory.java:50)
>   at
> org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:67)
> 
> 
> I am using solr-4.3.1. As discussed at SOLR-4852, I had all the jars at
> (SOLR_HOME)/lib and there is no reference to lib via any of solrconfig.xml
> or schema.xml.

I filed SOLR-4852.  Resource loading seems to be a black art with Solr!

The only jars you need for the ICU analysis components are
lucene-analyzers-icu-4.3.1.jar and icu4j-49.1.jar, possibly with
different version numbers in the names.

Are you defining solr.solr.home explicitly?  I'm just wondering if maybe
${solr.solr.home}/lib isn't where you think it is, or whether maybe
there's another copy of the jars somewhere on the classpath.  The log
should show which jars are loaded ... do you see either of the above
jars loaded more than once?  If you do, that seems to be the trigger for
the problem.  All but one copy needs to be removed.

If the jars exist in the extracted WAR (the WEB-INF location you
mentioned), everything seems to work, but the problem with this is that
when you replace the .war file, your changes to the extracted war will
either be outdated or possibly will get removed.  It is good practice to
entirely remove the extracted .war contents when upgrading Solr.

Thanks,
Shawn



Re: deleting large amount data from solr cloud

2014-04-11 Thread Vinay Pothnis
Sorry - yes, I meant to say leader.
Each JVM has 16G of memory.


On 10 April 2014 20:54, Erick Erickson  wrote:

> First, there is no "master" node, just leaders and replicas. But that's a
> nit.
>
> No real clue why you would be going out of memory. Deleting a
> document, even by query should just mark the docs as deleted, a pretty
> low-cost operation.
>
> how much memory are you giving the JVM?
>
> Best,
> Erick
>
> On Thu, Apr 10, 2014 at 6:25 PM, Vinay Pothnis  wrote:
> > [solr version 4.3.1]
> >
> > Hello,
> >
> > I have a solr cloud (4 nodes - 2 shards) with a fairly large amount
> > documents (~360G of index per shard). Now, a major portion of the data is
> > not required and I need to delete those documents. I would need to delete
> > around 75% of the data.
> >
> > One of the solutions could be to drop the index completely re-index. But
> > this is not an option at the moment.
> >
> > When we tried to delete the data through a query - say 1 day/month's
> worth
> > of data. But after deleting just 1 month's worth of data, the master node
> > is going out of memory - heap space.
> >
> > Wondering is there any way to incrementally delete the data without
> > affecting the cluster adversely.
> >
> > Thank!
> > Vinay
>


RE: Were changes made to facetting on multivalued fields recently?

2014-04-11 Thread Jean-Sebastien Vachon
Thanks to both of you. I finally found the issue and you were right (again) ;)

The problem was not coming from the full indexation code containing the SQL 
replace statement but from another process whose job is to maintain our index 
up to date. This process had no idea that commas were to be replaced by spaces 
for some fields (and it should not about this either).

I changed the Tokenizer used for the field to the following and everything is 
fine now.


Thanks for your help

> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: April-10-14 1:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Were changes made to facetting on multivalued fields recently?
> 
> bq: The SQL query contains a Replace statement that does this
> 
> Well, I suspect that's where the issue is. The facet values being reported
> include:
> 134826
> which indicates that the incoming text to Solr still has the commas.
> Solr is seeing the commas and all.
> 
> You can cure this by using PatternReplaceCharFilterFactory and doing the
> substitution at index time if you want to.
> 
> That doesn't clarify why the behavior has changed though, but my
> supposition is that it has nothing to do with Solr, and something about your
> SQL statement is different.
> 
> Best,
> Erick
> 
> On Thu, Apr 10, 2014 at 9:33 AM, Jean-Sebastien Vachon  sebastien.vac...@wantedanalytics.com> wrote:
> > The SQL query contains a Replace statement that does this
> >
> >> -Original Message-
> >> From: Shawn Heisey [mailto:s...@elyograg.org]
> >> Sent: April-10-14 11:30 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Were changes made to facetting on multivalued fields
> recently?
> >>
> >> On 4/10/2014 9:14 AM, Jean-Sebastien Vachon wrote:
> >> > Here are the field definitions for both our old and new index... as
> >> > you can
> >> see that are identical. We've been using this chain and field type
> >> starting with Solr 1.4 and never had any problem. As for the
> >> documents, both indexes are using the same data source. They could be
> >> slightly out of sync from time to time but we tend to index them on a
> >> daily basis. Both indexes are also using the same code (indexing through
> SolrJ) to index their content.
> >> >
> >> > The source is a column in MySql that contains entries such as "4,1"
> >> > that get stored in a Multivalued fields after replacing commas by
> >> > spaces
> >> >
> >> > OLD (4.6.1):
> >> > >> positionIncrementGap="100">
> >> >   
> >> > 
> >> >   
> >> > 
> >> >
> >> >  >> > stored="true" required="false" multiValued="true" />
> >>
> >> Just so you know, there's nothing here that would require the field
> >> to be multivalued.  WhitespaceTokenizerFactory does not create
> >> multiple field values, it creates multiple terms.  If you are
> >> actually inserting multiple values for the field in SolrJ, then you would
> need a multivalued field.
> >>
> >> What is replacing the commas with spaces?  I don't see anything here
> >> that would do that.  It sounds like that part of your indexing is not
> working.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >> -
> >> Aucun virus trouvé dans ce message.
> >> Analyse effectuée par AVG - www.avg.fr
> >> Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date:
> >> 09/04/2014
> 
> -
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4355 / Base de données virale: 3882/7323 - Date:
> 09/04/2014


Search a list of words and returned order

2014-04-11 Thread Croci Francesco Luigi (ID SWS)
When I search  for a list of words, per default Solr uses the OR operator.

In my case I index (pdfs) files. How/what can I do so that when I search the 
index for a list of words, I get the list of documents ordered first by  the 
ones that have all the words in them?

Thank you
Francesco


[ANN] Solr learning resources on safariflow.com (w/subscription or free trial)

2014-04-11 Thread Michael Sokolov
I just wanted to let people know about some recent Solr books and videos 
that are now available at safariflow.com.  You can sign up for a free 
trial and get instant access, buy a subscription, or you may already be 
a subscriber.  I don't normally send out announcements like this, but 
because we just got an influx of new material, I thought people might be 
interested.


Solr in Action (March 2014)
http://www.safariflow.com/library/view/Solr-in-Action/9781617291029/

Einführung in Apache Solr (March 2014)
http://www.safariflow.com/library/view/Einf%25C3%25BChrung-in-Apache-Solr/9783955614249/

Apache Solr High Performance (March 2014)
http://www.safariflow.com/library/view/Apache-Solr-High-Performance/9781782164821/

Getting Started with Apache Solr Search Server (June 2013 video course):
http://www.safariflow.com/library/view/Getting-started-with-Apache-Solr-Search-Server-%255BVideo%255D/9781782160847/



In addition these are some other Solr and Lucene titles we have had for 
a little while:




http://www.safariflow.com/library/view/Instant-Apache-Solr-for-Indexing-Data-How-to/9781782164845/

http://www.safariflow.com/library/view/Apache-Solr-3-Enterprise-Search-Server/9781849516068/

http://www.safariflow.com/library/view/Lucene-in-Action%252C-Second-Edition/9781933988177/




Re: High CPU usage after import

2014-04-11 Thread Александр Вандышев

I realized what the problem was. One of the Solr threads freezes when importing
MP3 files. When there are many such files Solr loads all processors. Is there a
way to free thread?

Re: High CPU usage after import That could mean that the code is hung somehow.
Or, maybe Solr is just
working on the commit. Unless you have an explicit commit, the automatic
commit will occur some time after the extract request. How much data are we
talking about?

What does the Solr log say? Compare that to the case where CPU usage does
settle down.

-- Jack Krupansky

-Original Message-
From: Александр Вандышев
Sent: Thursday, April 3, 2014 3:24 AM
To: Solr User
Subject: High CPU usage after import

Thanks for the answer. I meant that the CPU does not free after the end of
import.Tomtcat or Solr continue use it in max level.

.

Вт. 01 апр. 2014 20:09:24 пользователь Jack Krupansky
(j...@basetechnology.com)
написал:


Some document types can consume significant CPU resources, such as large PDF
files.

-- Jack Krupansky

-Original Message-
From: Александр Вандышев
Sent: Tuesday, April 1, 2014 9:28 AM
To: Solr User
Subject: High CPU usage after import

I use a update/extract handler for indexing a large number of files. If
during
indexing a CPU loads was not maximum at the end of import loading decreases.
If
CPU loading was max then loading remain high. Who can help me?


Re: DataImportHandler - Automatic scheduling of delta imports in Solr in windows 7

2014-04-11 Thread harshrossi
Yes that is all fine with me. Only thing that worries me is what needs to be
coded in the batch file.
I will just try a sample batch file and get back with queries if any.

Thank you 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-Automatic-scheduling-of-delta-imports-in-Solr-in-windows-7-tp4130565p4130635.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fails to index if unique field has special characters

2014-04-11 Thread Markus Jelsma
Well, this is somewhat of  a problem if you have have URL's as uniqueKey that 
contain exclamation marks. Isn't it an idea to allow those to be escaped and 
thus ignored by CompositeIdRouter?

On Friday, April 11, 2014 11:43:31 AM Cool Techi wrote:
> Thanks, that was helpful.
> Regards,Rohit
> 
> > Date: Thu, 10 Apr 2014 08:44:36 -0700
> > From: iori...@yahoo.com
> > Subject: Re: Fails to index if unique field has special characters
> > To: solr-user@lucene.apache.org
> > 
> > Hi Ayush,
> > 
> > I thinks this
> > 
> > ""IBM!12345". The exclamation mark ('!') is critical here, as it
> > distinguishes the prefix used to determine which shard to direct the
> > document to."
> > 
> > https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+
> > in+SolrCloud
> > 
> > 
> > 
> > 
> > On Thursday, April 10, 2014 2:35 PM, Cool Techi 
> > wrote: Hi,
> > We are migrating from Solr 4.6 standalone to Solr 4.7 cloud version, while
> > reindexing the document we are getting the following error. This is
> > happening when the unique key has special character, this was not noticed
> > in version 4.6 standalone mode, so we are not sure if this is a version
> > problem or a cloud issue. Example of the unique key is given below,
> > http://www.mynews.in/Blog/smrity!!**)))!miami_dolphins_vs_dallas_cowboys_
> > live_stream_on_line_nfl_football_free_video_broadcast_B142707.html
> > Exception Stack Trace
> > ERROR - 2014-04-10 10:51:44.361; org.apache.solr.common.SolrException;
> > java.lang.ArrayIndexOutOfBoundsException: 2   at
> > org.apache.solr.common.cloud.CompositeIdRouter$KeyParser.getHash(Composit
> > eIdRouter.java:296)   at
> > org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRoute
> > r.java:58)   at
> > org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRout
> > er.java:33)   at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(
> > DistributedUpdateProcessor.java:218)   at
> > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(Di
> > stributedUpdateProcessor.java:550)   at
> > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP
> > rocessorFactory.java:100)   at
> > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247
> > )   at
> > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)  
> > at>
> > 
> >org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.j
> >ava:92)   at
> >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Content
> >StreamHandlerBase.java:74)   at
> >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
> >e.java:135)   at
> >org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)   at
> >org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java
> >:780)   at
> >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
> >a:427)   at
> >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
> >a:217)   at
> >org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl
> >er.java:1419)   at
> >org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> >   at
> >org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1
> >37)   at
> >org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557
> >)   at org.eclipse.jetty.server.session.SessionHandle>
> > Thanks,Ayush  
> 
>  



RE: Fails to index if unique field has special characters

2014-04-11 Thread Cool Techi

Thanks, that was helpful.
Regards,Rohit
> Date: Thu, 10 Apr 2014 08:44:36 -0700
> From: iori...@yahoo.com
> Subject: Re: Fails to index if unique field has special characters
> To: solr-user@lucene.apache.org
> 
> Hi Ayush,
> 
> I thinks this 
> 
> ""IBM!12345". The exclamation mark ('!') is critical here, as it 
> distinguishes the prefix used to determine which shard to direct the document 
> to."
> 
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> 
> 
> 
> 
> On Thursday, April 10, 2014 2:35 PM, Cool Techi  
> wrote:
> Hi,
> We are migrating from Solr 4.6 standalone to Solr 4.7 cloud version, while 
> reindexing the document we are getting the following error. This is happening 
> when the unique key has special character, this was not noticed in version 
> 4.6 standalone mode, so we are not sure if this is a version problem or a 
> cloud issue. Example of the unique key is given below,
> http://www.mynews.in/Blog/smrity!!**)))!miami_dolphins_vs_dallas_cowboys_live_stream_on_line_nfl_football_free_video_broadcast_B142707.html
> Exception Stack Trace
> ERROR - 2014-04-10 10:51:44.361; org.apache.solr.common.SolrException; 
> java.lang.ArrayIndexOutOfBoundsException: 2   at 
> org.apache.solr.common.cloud.CompositeIdRouter$KeyParser.getHash(CompositeIdRouter.java:296)
>at 
> org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRouter.java:58)
>at 
> org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:33)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:218)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:550)
>at 
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
>at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)   
> at
>  
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
>at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) 
>   at
>  org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)  
>  at org.eclipse.jetty.server.session.SessionHandle
> 
> Thanks,Ayush   
  

SOLR problem with full-import and shards

2014-04-11 Thread Wojciech Jaworski
Hi,



I built an Apache SOLR cloud (version 4.7.0) with 3 shards. I chose
implicit routing mechanism while creating new collection (one shard per
month, fields with date format MM use as shardId). I configured
DataImportHandler with database as a data source.



Finally I run full-import (data from 3 months is present in database) on
the shard leader of first month's shard. Although I received success
message on the web page, the data on the first shard was indexed only (data
from first month which is ok; the data from other two months which should
come to other two shards was nowhere indexed.)



I checked logs and spotted hundreds of errors:



WARN  - 2014-04-11 10:55:33.921;
org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending
update

org.apache.solr.common.SolrException: Bad Request



request: http://
:/solr/trans_implicit/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F://%3A%2Fsolr%2Ftrans_implicit%2F&wt=javabin&version=2

at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)



Can anyone help? I would very appreciate for any suggestions.



Regards,

Wojtek Jaworski


highlighting displays to much

2014-04-11 Thread aowen
i am using solr 4.3.1 and want to highlight complete sentences if possible or 
at least not cut up words. it it finds something the hole field is displayed 
instead of only 180 chars

the field is:

  

solrconfig setting for highlighting:
   true
   plain_text title description
   
   
   5
   180
   regex
   0.2
   \w[^\.!\?]{20,160}


Class not found ICUFoldingFilter (SOLR-4852)

2014-04-11 Thread ronak kirit
Hello,

I am facing the same issue discussed at SOLR-4852. I am getting below error:

Caused by: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.lucene.analysis.icu.ICUFoldingFilter
at
org.apache.lucene.analysis.icu.ICUFoldingFilterFactory.create(ICUFoldingFilterFactory.java:50)
  at
org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:67)


I am using solr-4.3.1. As discussed at SOLR-4852, I had all the jars at
(SOLR_HOME)/lib and there is no reference to lib via any of solrconfig.xml
or schema.xml.

I have also tried with setting "sharedLib=foo", but that also didn't work.
However, if  I removed all the below files:

icu4j-49.1.jar

lucene-analyzers-morfologik-4.3.1.jar  l

ucene-analyzers-stempel-4.3.1.jar

solr-analysis-extras-4.3.1.jar

lucene-analyzers-icu-4.3.1.jar

lucene-analyzers-smartcn-4.3.1.jar

lucene-analyzers-uima-4.3.1.jar

from $(solrhome)/lib and move to solr-webapp/webapp/WEB-INF/lib things are
working fine.

Any guess? Any help?

Thanks,

Ronak


Re: Solr relevancy tuning

2014-04-11 Thread Giovanni Bricconi
Hello Doug

I have just watched the quepid demonstration video, and I strongly agree
with your introduction: it is very hard to involve marketing/business
people in repeated testing session, and speadsheets or other kind of files
are not the right tool to use.
Currenlty I'm quite alone in my tuning task and having a visual approach
could be benefical for me, you are giving me many good inputs!

I see that kelvin (my scripted tool) and queepid follows the same path. In
queepid someone quickly whatches the results and applies colours to result,
in kelvin you enter one on more queries (network cable, ethernet cable) and
states that the result must contains ethernet in the title, or must come
from a list of product categories.

I also do diffs of results, before and after changes, to check what is
going on; but I have to do that in a very unix-scripted way.

Have you considered of placing a counter of total red/bad results in
quepid? I use this index to have a quick overview of changes impact across
all queries. Actually I repeat tests in production from times to time, and
if I see the "kelvin temperature" rising (the number of errors going up) I
know I have to check what's going on because new products maybe are having
a bad impact on the index.

I also keep counters of products with low quality images/no images at all
or too short listings, sometimes are useful to undestand better what will
happen if you change some bq/fq in the application.

I see also that after changes in quepid someone have to check "gray"
results and assign them a colour, in kelvin case sometimes the conditions
can do a bit of magic (new product names still contains SM-G900F) but
sometimes can introduce false errors (the new product name contains only
Galaxy 5 and not the product code SM-G900F). So some checks are needed but
with quepid everybody can do the check, with kelvin you have to change some
line of a script, and not everybody is able/willing to do that.

The idea of a static index is a good suggestion, I will try to have it in
the next round of search engine improvement.

Thank you Doug!




2014-04-09 17:48 GMT+02:00 Doug Turnbull <
dturnb...@opensourceconnections.com>:

> Hey Giovanni, nice to meet you.
>
> I'm the person that did the Test Driven Relevancy talk. We've got a product
> Quepid (http://quepid.com) that lets you gather good/bad results for
> queries and do a sort of test driven development against search relevancy.
> Sounds similar to your existing scripted approach. Have you considered
> keeping a static catalog for testing purposes? We had a project with a lot
> of updates and date-dependent relevancy. This lets you create some test
> scenarios against a static data set. However, one downside is you can't
> recreate problems in production in your test setup exactly-- you have to
> find a similar issue that reflects what you're seeing.
>
> Cheers,
> -Doug
>
>
> On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi <
> giovanni.bricc...@banzai.it> wrote:
>
> > Thank you for the links.
> >
> > The book is really useful, I will definitively have to spend some time
> > reformatting the logs to to access number of result founds, session id
> and
> > much more.
> >
> > I'm also quite happy that my test cases produces similar results to the
> > precision reports shown at the beginning of the book.
> >
> > Giovanni
> >
> >
> > 2014-04-09 12:59 GMT+02:00 Ahmet Arslan :
> >
> > > Hi Giovanni,
> > >
> > > Here are some relevant pointers :
> > >
> > >
> > >
> >
> http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy
> > >
> > >
> > > http://rosenfeldmedia.com/books/search-analytics/
> > >
> > > http://www.sematext.com/search-analytics/index.html
> > >
> > >
> > > Ahmet
> > >
> > >
> > > On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi <
> > > giovanni.bricc...@banzai.it> wrote:
> > > It is about one year I'm working on an e-commerce site, and
> > unfortunately I
> > > have no "information retrieval" background, so probably I am missing
> some
> > > important practices about relevance tuning and search engines.
> > > During this period I had to fix many "bugs" about bad search results,
> > which
> > > I have solved sometimes tuning edismax weights, sometimes creating ad
> hoc
> > > query filters or query boosting; but I am still not able to figure out
> > what
> > > should be the correct process to improve search results relevance.
> > >
> > > These are the practices I am following, I would really appreciate any
> > > comments about them and any hints about what practices you follow in
> your
> > > projects:
> > >
> > > - In order to have a measure of search quality I have written many test
> > > cases such as "if the user searches for <> the search
> > > result should display at least four <> products with the words
> > > <> and <> in the title". I have written a tool that
> > read
> > > such tests from json files and applies them to my appli

Re: Shared Stored Field

2014-04-11 Thread StrW_dev
Erick Erickson wrote
> So you're saying that you have B_1 - B_8 in one doc, B_9 - B_16 in
> another doc etc?

Well yes that could work, but this would mean we get a lot of unique dymanic
fields, basically equal to the number of documents in our system and I am
not sure if that is a good practice.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shared-Stored-Field-tp4130351p4130589.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-11 Thread Dmitry Kan
Thanks Shawn,

perhaps the comment on the luceneMatchVersion in the example schema.xml
could be changed to reflect / clarify this?

  

this comment made me think that the parameter is affecting the index side
of things too (aka index format version). I.e. I would appreciate seeing
there things like you just mentioned regarding emulated behaviour. So we
could draw a line between the index format (low-level, not controllable by
a user) and analysis chain etc (solr config level, user controllable).

I have tried specifying the postingsFormat on a per field type basis. For
postingsFormat="Lucene40" I get:

org.apache.solr.client.solrj.SolrServerException:
java.lang.UnsupportedOperationException: this codec can only be used for
reading
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155)
... 12 more
Caused by: java.lang.UnsupportedOperationException: this codec can only be
used for reading
at
org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer(Lucene40PostingsFormat.java:246)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:131)
at
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:336)
at
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
at
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:465)
at
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506)
at
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:616)
at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2864)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3022)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2989)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:578)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1457)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1434)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
... 12 more

But that is just a side note.

I have added a comment to the cwiki regarding the possible values for
postingsFormat parameter (currently values marked as n/a):
https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties

Dmitry



On Fri, Apr 11, 2014 at 10:42 AM, Shawn Heisey  wrote:

> On 4/11/2014 12:42 AM, Dmitry Kan wrote:
> > Thanks! So solr 4.7 does not seem to respect the luceneMatchVersion on
> the
> > binary (index) level. Or perhaps, I misunderstand the meaning of the
> > luceneMatchVersion.
>
> luceneMatchVersion does not dictate the index format.  It is a way to
> signal things like analysis components that they should emulate behavior
> (sometimes buggy) found in an earlier version.  Not all analysis
> components will operate differently when this config is used.  There is
> probably not a central repository of how the version affects Solr/Lucene
> behavior.
>
> > I wonder whether there is any possibility of defining the version of the
> > codec in solr config/schema.
>
> I don't think Solr exposes any way to define an entire codec.  You can
> change things individually, like the postings format and docValues
> format on a field, but there's no way (that I know of) to define an
> entire codec.  The overall index format is not something you can specify.
>
> I think it could be possible to come up with some XML syntax for
> describing a complete codec and then write code to parse it and build
> the codec ... but because my understanding of how all the Lucene pieces
> fit together is relatively low, there may be some really good reason
> that Solr doesn't offer this functionality.
>
> Thanks,
> Shawn
>
>


-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: MapReduceIndexerTool does not respect Lucene version in solrconfig Was: converting 4.7 index to 4.3.1

2014-04-11 Thread Shawn Heisey
On 4/11/2014 12:42 AM, Dmitry Kan wrote:
> Thanks! So solr 4.7 does not seem to respect the luceneMatchVersion on the
> binary (index) level. Or perhaps, I misunderstand the meaning of the
> luceneMatchVersion.

luceneMatchVersion does not dictate the index format.  It is a way to
signal things like analysis components that they should emulate behavior
(sometimes buggy) found in an earlier version.  Not all analysis
components will operate differently when this config is used.  There is
probably not a central repository of how the version affects Solr/Lucene
behavior.

> I wonder whether there is any possibility of defining the version of the
> codec in solr config/schema.

I don't think Solr exposes any way to define an entire codec.  You can
change things individually, like the postings format and docValues
format on a field, but there's no way (that I know of) to define an
entire codec.  The overall index format is not something you can specify.

I think it could be possible to come up with some XML syntax for
describing a complete codec and then write code to parse it and build
the codec ... but because my understanding of how all the Lucene pieces
fit together is relatively low, there may be some really good reason
that Solr doesn't offer this functionality.

Thanks,
Shawn



Re: Pushing content to Solr from Nutch

2014-04-11 Thread Furkan KAMACI
Hi Xavier;

I think that it is better to ask this question at Nutch user list.

Thanks;
Furkan KAMACI


2014-04-11 7:52 GMT+03:00 Jack Krupansky :

> Does your Solr schema match the data output by nutch? It's up to you to
> create a Solr schema that matches the output of nutch - read up on the
> nutch doc for that info. Solr doesn't define that info, nutch does.
>
> -- Jack Krupansky
>
> From: Xavier Morera
> Sent: Thursday, April 10, 2014 12:58 PM
> To: solr-user@lucene.apache.org
> Subject: Pushing content to Solr from Nutch
>
> Hi,
>
> I have followed several Nutch tutorials - including the main one
> http://wiki.apache.org/nutch/NutchTutorial - to crawl sites (which works,
> I can see in the console as the pages get crawled and the directories built
> with the data) but for the life of me I can't get anything posted to Solr.
> The Solr console doesn't even squint, therefore Nutch is not sending
> anything.
>
> This is the command that I send over that crawls and in theory should also
> post
> bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr 2
>
>
> But I found that I could also use this one when it is already crawled
> bin/nutch solrindex http://localhost:8983/solr crawl/crawldb crawl/linkdb
> crawl/segments/*
>
>
> But no luck.
>
> This is the only thing that called my attention but I read that by adding
> the property below would work but doesn't work.
> No IndexWriters activated - check your configuration
>
>
> This is the property
> 
> plugin.includes
>
> protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)
> 
>
> Any idea? Apache Nutch 1.8 running Java 1.6 via Cygwin on Windows.
>
> --
>
> Xavier Morera
> email: xav...@familiamorera.com
>
> CR: +(506) 8849 8866
> US: +1 (305) 600 4919
> skype: xmorera
>