Re: unable to load core after cluster restart

2013-11-02 Thread kaustubh147
Hi Shawn,

One thing I forget to mention here is the same setup (with no bootstrap) is
working fine in our QA1 environment. I did not have the bootstrap option
from start, I added it thinking it will solve the problem.

Nonetheless I followed Shawn's instructions, wherever it differed from my
old approach...
1. I moved my zkHost from JVM to solr.xml and added chroot in it
2. removed bootstrap option
3. created collections with URL template suggested (I have tried it earlier
too)

None of it worked for me... I am seeing same errors.. I am adding some more
logs before and after the error occurs


-

INFO  - 2013-11-02 17:40:40.427;
org.apache.solr.update.DefaultSolrCoreState; closing IndexWriter with
IndexWriterCloser
INFO  - 2013-11-02 17:40:40.428; org.apache.solr.core.SolrCore; [xyz]
Closing main searcher on request.
INFO  - 2013-11-02 17:40:40.431;
org.apache.solr.core.CachingDirectoryFactory; Closing
NRTCachingDirectoryFactory - 1 directories currently being tracked
INFO  - 2013-11-02 17:40:40.432;
org.apache.solr.core.CachingDirectoryFactory; looking to close
/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data
[CachedDir<>]
INFO  - 2013-11-02 17:40:40.432;
org.apache.solr.core.CachingDirectoryFactory; Closing directory:
/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data
ERROR - 2013-11-02 17:40:40.433; org.apache.solr.core.CoreContainer; Unable
to create core: xyz
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.(SolrCore.java:834)
at org.apache.solr.core.SolrCore.(SolrCore.java:625)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:256)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:555)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:247)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:239)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1477)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1589)
at org.apache.solr.core.SolrCore.(SolrCore.java:821)
... 13 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out:
NativeFSLock@/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:695)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:77)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:267)
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:110)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1440)
... 15 more
ERROR - 2013-11-02 17:40:40.443; org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException: Unable to create core: xyz
at
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:934)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:566)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:247)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:239)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.(SolrCore.java:834)
at org.apache.solr.core.SolrCore.(SolrCore.java:625)
at org.apache.solr.core.ZkContainer.createFromZk(ZkCont

Writing a Solr custom analyzer to post content to Stanbol {was: Need additional data processing in Data Import Handler prior to indexing}

2013-11-02 Thread Dileepa Jayakody
Hi All,

I went through possible solutions for my requirement of triggering a
Stanbol enhancement during Solr indexing, and I got the requirement
simplified.

I only need to process the field named "content" to perform the Stanbol
enhancement to extract Person and Organizations.
So I think it will be easier to do the Stanbol request during indexing the
"content" field , after the data is imported (from DIH).

I think the best solution will be to write a custom Analyzer to process the
content and post it to Stanbol.
In the analyzer I also need to process the Stanbol enhancement response.
The response should be processed as a new document to index and store the
identified Person and Organization entities in a field called
"extractedEntities".

So my current idea is as follows;

in the schema.xml





 
  
 

In the : MyCustomAnalyzer class the content will be posted and enhanced
from Stanbol. The Person and Organization entities in the response should
be indexed into the Solr field "extractedEntities".
Am I going in the correct path for my requirement? Please share your ideas.
Appreciate any relevant pointers to samples/documentation.

Thanks,
Dileea

On Wed, Oct 30, 2013 at 11:26 AM, Dileepa Jayakody <
dileepajayak...@gmail.com> wrote:

> Thanks guys for your ideas.
>
> I will go through them and come back with questions.
>
> Regards,
> Dileepa
>
>
> On Wed, Oct 30, 2013 at 7:00 AM, Erick Erickson 
> wrote:
>
>> Third time tonight I've been able to paste this link
>>
>> Also, you can consider just moving to SolrJ and
>> taking DIH out of the process, see:
>> http://searchhub.org/2012/02/14/indexing-with-solrj/
>>
>> Whichever approach fits your needs of course.
>>
>> Best,
>> Erick
>>
>>
>> On Tue, Oct 29, 2013 at 7:15 PM, Alexandre Rafalovitch
>> wrote:
>>
>> > It's also possible to combine Update Request Processor with DIH. That
>> way
>> > if a debug entry needs to be inserted it could go through the same
>> Stanbol
>> > process.
>> >
>> > Just define a processing chain the DIH handler and write custom URP to
>> call
>> > out to Stanbol web service. You have access to a full record in URP, so
>> can
>> > add/delete/change the fields at will.
>> >
>> > Regards,
>> >Alex.
>> >
>> > Personal website: http://www.outerthoughts.com/
>> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> > - Time is the quality of nature that keeps events from happening all at
>> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>> >
>> >
>> > On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta <
>> > michael.della.bi...@appinions.com> wrote:
>> >
>> > > Hi Dileepa,
>> > >
>> > > You can write your own Transformers in Java. If it doesn't make sense
>> to
>> > > run Stanbol calls in a Transformer, maybe setting up a web service
>> that
>> > > grabs a record out of MySQL, sends the data to Stanbol, and displays
>> the
>> > > results could be used in conjunction with HttpDataSource rather than
>> > > JdbcDataSource.
>> > >
>> > > http://wiki.apache.org/solr/DIHCustomTransformer
>> > >
>> > >
>> >
>> http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource
>> > >
>> > > Michael Della Bitta
>> > >
>> > > Applications Developer
>> > >
>> > > o: +1 646 532 3062  | c: +1 917 477 7906
>> > >
>> > > appinions inc.
>> > >
>> > > “The Science of Influence Marketing”
>> > >
>> > > 18 East 41st Street
>> > >
>> > > New York, NY 10017
>> > >
>> > > t: @appinions  | g+:
>> > > plus.google.com/appinions<
>> > >
>> >
>> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
>> > > >
>> > > w: appinions.com 
>> > >
>> > >
>> > > On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody <
>> > > dileepajayak...@gmail.com
>> > > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I'm a newbie to Solr, and I have a requirement to import data from a
>> > > mysql
>> > > > database; enhance  the imported content to identify Persons
>> mentioned
>> > >  and
>> > > > index it as a separate field in Solr along with the other fields
>> > defined
>> > > > for the original db query.
>> > > >
>> > > > I'm using Apache Stanbol [1] for the content enhancement
>> requirement.
>> > > > I can get enhancement results for 'Person' type data in the content
>> as
>> > > the
>> > > > enhancement result.
>> > > >
>> > > > The data flow will be;
>> > > > mysql-db > Solr data-import handler > Stanbol enhancer > Solr index
>> > > >
>> > > > For the above requirement I need to perform additional processing at
>> > the
>> > > > data-import handler prior to indexing to send a request to Stanbol
>> and
>> > > > process the enhancement response. I found some related examples on
>> > > > modifying mysql data import handler to customize the query results
>> in
>> > > > db-data-config.xml by using a transformer script.
>> > > > As per my requirement, In the data-import-handler I need to send a
>> > > request
>> > > > to Stanbol and proces

Re: Custom Plugin exception : Plugin init failure for [schema.xml]

2013-11-02 Thread Parvin Gasimzade
Hi Shawn,

Thank you for your answer. I have solved the problem.

The problem is, in our code constructor of TurkishFilterFactory is setted
as protected and that works without problem on the 3.x versions of the Solr
but gives the exception that I mentioned here on 4.x versions. By analyzing
the stack trace I saw that it gives an InstantationException and by making
constructor public solves the problem.


On Fri, Nov 1, 2013 at 6:34 PM, Shawn Heisey  wrote:

> On 11/1/2013 4:18 AM, Parvin Gasimzade wrote:
> > I have a problem with custom plugin development in solr 4.x versions. I
> > have developed custom filter and trying to install it but I got following
> > exception.
>
> Later you indicated that you can use it with Solr 3.x without any problem.
>
> Did you recompile your custom plugin against the Solr jars from the new
> version?  There was a *huge* amount of java class refactoring that went
> into the 4.0 version as compared to any 3.x version, and that continues
> with each new 4.x release.
>
> I would bet that if you tried that recompile, it would fail due to
> errors and/or warnings, which you'll need to fix.  There might also be
> operational problems that the compiler doesn't find, due to changes in
> how the underlying APIs get used.
>
> Thanks,
> Shawn
>
>


Re: Background merge errors with Solr 4.4.0 on Optimize call

2013-11-02 Thread Erick Erickson
See: https://issues.apache.org/jira/browse/SOLR-5418

Thanks Matthew and Robert! I'll see if I can get to this this weekend.





On Wed, Oct 30, 2013 at 7:45 AM, Erick Erickson wrote:

> Robert:
>
> Thanks. I'm on my way out the door, so I'll have to put up a JIRA with
> your patch later if it hasn't been done already
>
> Erick
>
>
> On Tue, Oct 29, 2013 at 10:14 PM, Robert Muir  wrote:
>
>> I think its a bug, but thats just my opinion. i sent a patch to dev@
>> for thoughts.
>>
>> On Tue, Oct 29, 2013 at 6:09 PM, Erick Erickson 
>> wrote:
>> > Hmmm, so you're saying that merging indexes where a field
>> > has been removed isn't handled. So you have some documents
>> > that do have a "what" field, but your schema doesn't have it,
>> > is that true?
>> >
>> > It _seems_ like you could get by by putting the _what_ field back
>> > into your schema, just not sending any data to it in new docs.
>> >
>> > I'll let others who understand merging better than me chime in on
>> > whether this is a case that should be handled or a bug. I pinged the
>> > dev list to see what the opinion is
>> >
>> > Best,
>> > Erick
>> >
>> >
>> > On Mon, Oct 28, 2013 at 6:39 PM, Matthew Shapiro 
>> wrote:
>> >
>> >> Sorry for reposting after I just sent in a reply, but I just looked at
>> the
>> >> error trace closer and noticed
>> >>
>> >>
>> >>1. Caused by: java.lang.IllegalArgumentException: no such field what
>> >>
>> >>
>> >> The 'what' field was removed by request of the customer as they wanted
>> the
>> >> logic behind what gets queried in the "what" field to be code side
>> instead
>> >> of solr side (for easier changing without having to re-index
>> everything.  I
>> >> didn't feel strongly either way and since they are paying me, I took it
>> >> out).
>> >>
>> >> This makes me wonder if its crashing while merging because a field that
>> >> used to be there is now gone.  However, this seems odd to me as Solr
>> >> doesn't even let me delete the old data and instead its leaving my
>> >> collection in an extremely bad state, with the only remedy I can think
>> of
>> >> is to nuke the index at the filesystem level.
>> >>
>> >> If this is indeed the cause of the crash, is the only way to delete a
>> field
>> >> to first completely empty your index first?
>> >>
>> >>
>> >> On Mon, Oct 28, 2013 at 6:34 PM, Matthew Shapiro 
>> wrote:
>> >>
>> >> > Thanks for your response.
>> >> >
>> >> > You were right, solr is logging to the catalina.out file for tomcat.
>> >>  When
>> >> > I click the optimize button in solr's admin interface the following
>> logs
>> >> > are written: http://apaste.info/laup
>> >> >
>> >> > About JVM memory, solr's admin interface is listing JVM memory at
>> 3.1%
>> >> > (221.7MB is dark grey, 512.56MB light grey and 6.99GB total).
>> >> >
>> >> >
>> >> > On Mon, Oct 28, 2013 at 6:29 AM, Erick Erickson <
>> erickerick...@gmail.com
>> >> >wrote:
>> >> >
>> >> >> For Tomcat, the Solr is often put into catalina.out
>> >> >> as a default, so the output might be there. You can
>> >> >> configure Solr to send the logs most anywhere you
>> >> >> please, but without some specific setup
>> >> >> on your part the log output just goes to the default
>> >> >> for the servlet.
>> >> >>
>> >> >> I took a quick glance at the code but since the merges
>> >> >> are happening in the background, there's not much
>> >> >> context for where that error is thrown.
>> >> >>
>> >> >> How much memory is there for the JVM? I'm grasping
>> >> >> at straws a bit...
>> >> >>
>> >> >> Erick
>> >> >>
>> >> >>
>> >> >> On Sun, Oct 27, 2013 at 9:54 PM, Matthew Shapiro 
>> >> wrote:
>> >> >>
>> >> >> > I am working at implementing solr to work as the search backend
>> for
>> >> our
>> >> >> web
>> >> >> > system.  So far things have been going well, but today I made some
>> >> >> schema
>> >> >> > changes and now things have broken.
>> >> >> >
>> >> >> > I updated the schema.xml file and reloaded the core (via the admin
>> >> >> > interface).  No errors were reported in the logs.
>> >> >> >
>> >> >> > I then pushed 100 records to be indexed.  A call to Commit
>> afterwards
>> >> >> > seemed fine, however my next call for Optimize caused the
>> following
>> >> >> errors:
>> >> >> >
>> >> >> > java.io.IOException: background merge hit exception:
>> >> >> > _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into
>> _37
>> >> >> > [maxNumSegments=1]
>> >> >> >
>> >> >> > null:java.io.IOException: background merge hit exception:
>> >> >> > _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into
>> _37
>> >> >> > [maxNumSegments=1]
>> >> >> >
>> >> >> >
>> >> >> > Unfortunately, googling for background merge hit exception came up
>> >> >> > with 2 thing: a corrupt index or not enough free space.  The host
>> >> >> > machine that's hosting solr has 227 out of 229GB free (according
>> to df
>> >> >> > -h), so that's not it.
>> >> >> >
>> >> >> >
>> >> >> > I then ran CheckIndex on the index, and got the following results:
>> >> >>

Re: Store Solr OpenBitSets In Solr Indexes

2013-11-02 Thread David Philip
Oh fine. Caution point was useful for me.
Yes I wanted to do something similar to filer queries. It is not XY
problem. I am simply trying to implement  something as described below.

I have a [non-clinical] group sets in system and I want to build bitset
based on the documents belonging to that group and save it.
So that, While searching I want to retrieve similar bitset from Solr engine
for matched document and then execute logical XOR. [Am I clear with problem
explanation now?]


So what I am looking for is, If I have to retrieve bitset instance from
Solr search engine for the documents matched, how can I get it?
And How do I save bit mapping for the documents belonging to a particular
group. thus enable XOR operation.

Thanks - David










On Fri, Nov 1, 2013 at 5:05 PM, Erick Erickson wrote:

> Why are you saving this? Because if the bitset you're saving
> has anything to do with, say, filter queries, it's probably useless.
>
> The internal bitsets are often based on the internal Lucene doc ID,
> which will change when segment merges happen, thus the caution.
>
> Otherwise, theres the binary type you can probably use. It's not very
> efficient since I believe it uses base-64 encoding under the covers
> though...
>
> Is this an "XY" problem?
>
> Best,
> Erick
>
>
> On Wed, Oct 30, 2013 at 8:06 AM, David Philip
> wrote:
>
> > Hi All,
> >
> > What should be the field type if I have to save solr's open bit set value
> > within solr document object and retrieve it later for search?
> >
> >   OpenBitSet bits = new OpenBitSet();
> >
> >   bits.set(0);
> >   bits.set(1000);
> >
> >   doc.addField("SolrBitSets", bits);
> >
> >
> > What should be the field type of  SolrBitSets?
> >
> > Thanks
> >
>


Re: Problem of facet on 170M documents

2013-11-02 Thread Sascha SZOTT
Hi Ming,

which Solr version are you using? In case you use one of the latest
versions (4.5 or above) try the new parameter facet.threads with a
reasonable value (4 to 8 gave me a massive performance speedup when
working with large facets, i.e. nTerms >> 10^7).

-Sascha


Mingfeng Yang wrote:
> I have an index with 170M documents, and two of the fields for each
> doc is "source" and "url".  And I want to know the top 500 most
> frequent urls from Video source.
> 
> So I did a facet with 
> "fq=source:Video&facet=true&facet.field=url&facet.limit=500", and
> the matching documents are about 9 millions.
> 
> The solr cluster is hosted on two ec2 instances each with 4 cpu, and
> 32G memory. 16G is allocated tfor java heap.  4 master shards on one
> machine, and 4 replica on another machine. Connected together via
> zookeeper.
> 
> Whenever I did the query above, the response is just taking too long
> and the client will get timed out. Sometimes,  when the end user is
> impatient, so he/she may wait for a few second for the results, and
> then kill the connection, and then issue the same query again and
> again.  Then the server will have to deal with multiple such heavy
> queries simultaneously and being so busy that we got "no server
> hosting shard" error, probably due to lost communication between solr
> node and zookeeper.
> 
> Is there any way to deal with such problem?
> 
> Thanks, Ming
>