Hourly Faceting

2013-02-08 Thread Cool Techi
Hi,

I want to facet results on an hourly basis, the following query gives me an 
hourly breakdown, but with the date part. I want just the hour part across the 
days. Is there any other way of doing this,


2013-02-01T00:00:00Z-330MINUTES
true
twitterId:191343557
createdOnGMTDate
+1HOUR
2013-02-08T23:59:59Z-330MINUTES
0


Result

0
0
0
0
0
0

Desired Result

0
0
0
0
0
0

Regards,
Ayush
  

Re: Trying to understand soft vs hard commit vs transaction log

2013-02-08 Thread Mark Miller
A soft commit is just like a hard commit but doesn't do things like resolve 
deletes or call fsync on all the files that were written to disk. It will flush 
to disk however.

Hard commits are for durability if you are not using the update log. If you are 
using the update log, hard commits are about flushing the update log to disk 
(eg keeping update log RAM usage down).

Soft commits are more about visibility - because it won't do things like fsync, 
it won't guarantee that segment that was flushed to disk will survive a hard 
crash, but it will flush to disk and open a new view on that flushed segment.

- Mark

On Feb 7, 2013, at 11:29 PM, Alexandre Rafalovitch  wrote:

> Hello,
> 
> What actually happens when using soft (as opposed to hard) commit?
> 
> I understand somewhat very high-level picture (documents become available
> faster, but you may loose them on power loss).
> I don't care about low-level implementation details.
> 
> But I am trying to understand what is happening on the medium level of
> details.
> 
> For example what are stages of a document if we are using all available
> transaction log, soft commit, hard commit options? It feels like there is
> three stages:
> *) Uncommitted (soft or hard): accessible only via direct real-time get?
> *) Soft-committed: accessible through all search operatons? (but not on
> disk? but where is it? in memory?)
> *) Hard-committed: all the same as soft-committed but it is now on disk
> 
> Similarly,  in performance section of Wiki, it says: "A commit (including a
> soft commit) will free up almost all heap memory" - why would soft commit
> free up heap memory? I thought it was not flushed to disk.
> 
> Also, with soft-commits and transaction log enabled, doesn't transaction
> log allows to replay/recover the latest state after crash? I believe that's
> what transaction log does for the database. If not, how does one recover,
> if at all?
> 
> And where does openSearcher=false fits into that? Does it cause
> inconsistent results somehow?
> 
> I am missing something, but I am not sure what or where. Any points in the
> right direction would be appreciated.
> 
> Regards,
> Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



Custom update handler

2013-02-08 Thread Jorge Luis Betancourt Gonzalez
Hi:

I'm trying to build a custom update handler to accomplish one specific task. In 
our app we do query suggestions based on previous queries passed into our 
frontend app, the thing is that instead of getting this queries from the solr 
logs, we stored in a separated core. So far so good, but one particular 
requirement is that not every query typed by the users in the search box 
appears as a suggestion, only the more popuparls. For this we created a field 
in the schema called count. And write code in out frontend to increase this 
value, to be honest we don't like this. So we came up with an idea of writing a 
custom update handler that before store the query in the index, checks if the 
query exists and then add 1 to the counter. 

The thing is that right now we have set up a dedupe component to avoid storing 
very similar queries, is there any way of accessing the dedupe component from 
the custom update handler? Is there any documentation I can check out to see 
anything similar to this?

Greetings

Re: Trying to understand soft vs hard commit vs transaction log

2013-02-08 Thread Jack Krupansky
If you check the revision history of the wiki page, a Mr. jayqhacker added 
the quoted statement on November 26, 2012. I don't recognize his "name" as 
being a "known authority" on anything related to Solr, so maybe his 
uncorroborated comments should be taken with a grain of salt.


-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Friday, February 08, 2013 6:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Trying to understand soft vs hard commit vs transaction log

Sorry Shawn,

Somehow I am still not quite grasping it. I would really appreciate if
somebody (or even you) could have another go at very small part of this.
Maybe it will clear it up:

Similarly,  in performance section of Wiki, it says: "A commit (including

a soft commit) will free up almost all heap memory"
Why? What is the "hard work" that hard commit does and soft commit does not
but still commit to disk. Is it some sort of Lucene segment finalization
and new segment creation?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Feb 8, 2013 at 2:57 AM, Shawn Heisey  wrote:


On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote:


Hello,

What actually happens when using soft (as opposed to hard) commit?

I understand somewhat very high-level picture (documents become available
faster, but you may loose them on power loss).
I don't care about low-level implementation details.

But I am trying to understand what is happening on the medium level of
details.

For example what are stages of a document if we are using all available
transaction log, soft commit, hard commit options? It feels like there is
three stages:
*) Uncommitted (soft or hard): accessible only via direct real-time get?
*) Soft-committed: accessible through all search operatons? (but not on
disk? but where is it? in memory?)
*) Hard-committed: all the same as soft-committed but it is now on disk

Similarly,  in performance section of Wiki, it says: "A commit (including
a
soft commit) will free up almost all heap memory" - why would soft commit
free up heap memory? I thought it was not flushed to disk.

Also, with soft-commits and transaction log enabled, doesn't transaction
log allows to replay/recover the latest state after crash? I believe
that's
what transaction log does for the database. If not, how does one recover,
if at all?

And where does openSearcher=false fits into that? Does it cause
inconsistent results somehow?

I am missing something, but I am not sure what or where. Any points in 
the

right direction would be appreciated.



Let's see if I can answer your questions without giving you incorrect
information.

New indexed content is not searchable until you open a new searcher,
regardless of the type of commit that you do.

A hard commit will close the current transaction log and start a new one.
 It will also instruct the Directory implementation to flush to disk.  If
you specify openSearcher=false, then the content that has just been
committed will NOT be searchable, as discussed in the previous paragraph.
 The existing searcher will remain open and continue to serve queries
against the same index data.

A soft commit does not flush the new content to disk, but it does open a
new searcher.  I'm sure that the amount of memory available for caching
this content is not large, so it's possible that if you do a lot of
indexing with soft commits and your hard commits are too infrequent, 
you'll

end up flushing part of the cached data to disk anyway.  I'd love to hear
from a committer about this, because I could be wrong.

There's a caveat with that 'flush to disk' operation -- the default
Directory implementation in the Solr example config, which is
NRTCachingDirectoryFactory, will cache the last few megabytes of indexed
data and not flush it to disk even with a hard commit.  If your commits 
are

small, then the net result is similar to a soft commit.  If the server or
Solr were to crash, the transaction logs would be replayed on Solr 
startup,

recovering that last few megabytes.  The transaction log may also recover
documents that were soft committed, but I'm not 100% sure about that.

To take full advantage of NRT functionality, you can commit as often as
you like with soft commits.  On some reasonable interval, say every one to
fifteen minutes, you can issue a hard commit with openSearcher set to
false, to flush things to disk and cycle through transaction logs before
they get huge.  Solr will keep a few of the transaction logs around, and 
if

they are huge, it can take a long time to replay them.  You'll want to
choose a hard commit interval that doesn't create giant transaction logs.

If any of the info I've given here is wrong, someone should correct me!

Thanks,
Shawn






Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)

2013-02-08 Thread Jack Krupansky

(Sorry for my split message)...

See the text_en_splitting field type for an example:


...

-- Jack Krupansky

-Original Message- 
From: Zhang, Lisheng

Sent: Friday, February 08, 2013 3:20 PM
To: solr-user@lucene.apache.org
Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)


Hi,

In our application we need to call method

setAutoGeneratePhraseQueries(true)

on lucene QueryParser, this is the way used to work in earlier versions
and it seems to me that is the much natural way?

But in current solr 3.6.1, the only way to do so is to set

LUCENE_30

in solrconfig.xml (if I read souce code correctly), but I donot want to
do so because this will change the whole behavior of lucene, and I only
want to change this query parser behavior, not other lucene features?

Please guide me if there is a better way other than to change solr source
code?

Thanks very much for helps, Lisheng 



Re: Trying to understand soft vs hard commit vs transaction log

2013-02-08 Thread Shawn Heisey

On 2/8/2013 4:11 PM, Alexandre Rafalovitch wrote:

Sorry Shawn,

Somehow I am still not quite grasping it. I would really appreciate if
somebody (or even you) could have another go at very small part of this.
Maybe it will clear it up:

Similarly,  in performance section of Wiki, it says: "A commit (including

a soft commit) will free up almost all heap memory"
Why? What is the "hard work" that hard commit does and soft commit does not
but still commit to disk. Is it some sort of Lucene segment finalization
and new segment creation?


I don't know the answers to those questions, except to say that 
committing to disk involves I/O latency.  With standard hard disks, it's 
a LOT of latency.


Thanks,
Shawn



Re: Trying to understand soft vs hard commit vs transaction log

2013-02-08 Thread Alexandre Rafalovitch
Sorry Shawn,

Somehow I am still not quite grasping it. I would really appreciate if
somebody (or even you) could have another go at very small part of this.
Maybe it will clear it up:
> Similarly,  in performance section of Wiki, it says: "A commit (including
a soft commit) will free up almost all heap memory"
Why? What is the "hard work" that hard commit does and soft commit does not
but still commit to disk. Is it some sort of Lucene segment finalization
and new segment creation?

Regards,
Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Feb 8, 2013 at 2:57 AM, Shawn Heisey  wrote:

> On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote:
>
>> Hello,
>>
>> What actually happens when using soft (as opposed to hard) commit?
>>
>> I understand somewhat very high-level picture (documents become available
>> faster, but you may loose them on power loss).
>> I don't care about low-level implementation details.
>>
>> But I am trying to understand what is happening on the medium level of
>> details.
>>
>> For example what are stages of a document if we are using all available
>> transaction log, soft commit, hard commit options? It feels like there is
>> three stages:
>> *) Uncommitted (soft or hard): accessible only via direct real-time get?
>> *) Soft-committed: accessible through all search operatons? (but not on
>> disk? but where is it? in memory?)
>> *) Hard-committed: all the same as soft-committed but it is now on disk
>>
>> Similarly,  in performance section of Wiki, it says: "A commit (including
>> a
>> soft commit) will free up almost all heap memory" - why would soft commit
>> free up heap memory? I thought it was not flushed to disk.
>>
>> Also, with soft-commits and transaction log enabled, doesn't transaction
>> log allows to replay/recover the latest state after crash? I believe
>> that's
>> what transaction log does for the database. If not, how does one recover,
>> if at all?
>>
>> And where does openSearcher=false fits into that? Does it cause
>> inconsistent results somehow?
>>
>> I am missing something, but I am not sure what or where. Any points in the
>> right direction would be appreciated.
>>
>
> Let's see if I can answer your questions without giving you incorrect
> information.
>
> New indexed content is not searchable until you open a new searcher,
> regardless of the type of commit that you do.
>
> A hard commit will close the current transaction log and start a new one.
>  It will also instruct the Directory implementation to flush to disk.  If
> you specify openSearcher=false, then the content that has just been
> committed will NOT be searchable, as discussed in the previous paragraph.
>  The existing searcher will remain open and continue to serve queries
> against the same index data.
>
> A soft commit does not flush the new content to disk, but it does open a
> new searcher.  I'm sure that the amount of memory available for caching
> this content is not large, so it's possible that if you do a lot of
> indexing with soft commits and your hard commits are too infrequent, you'll
> end up flushing part of the cached data to disk anyway.  I'd love to hear
> from a committer about this, because I could be wrong.
>
> There's a caveat with that 'flush to disk' operation -- the default
> Directory implementation in the Solr example config, which is
> NRTCachingDirectoryFactory, will cache the last few megabytes of indexed
> data and not flush it to disk even with a hard commit.  If your commits are
> small, then the net result is similar to a soft commit.  If the server or
> Solr were to crash, the transaction logs would be replayed on Solr startup,
> recovering that last few megabytes.  The transaction log may also recover
> documents that were soft committed, but I'm not 100% sure about that.
>
> To take full advantage of NRT functionality, you can commit as often as
> you like with soft commits.  On some reasonable interval, say every one to
> fifteen minutes, you can issue a hard commit with openSearcher set to
> false, to flush things to disk and cycle through transaction logs before
> they get huge.  Solr will keep a few of the transaction logs around, and if
> they are huge, it can take a long time to replay them.  You'll want to
> choose a hard commit interval that doesn't create giant transaction logs.
>
> If any of the info I've given here is wrong, someone should correct me!
>
> Thanks,
> Shawn
>
>


Re: Can Solr analyze content and find dates and places

2013-02-08 Thread SUJIT PAL
Hi Bart,

I did some work with UIMA but this was to annotate the data before it goes to 
Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through 
the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you 
will have to set up your own aggregate analysis chain in place of the one 
currently configured.

Writing UIMA annotators is very simple (there is a tutorial here:  
[http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]).
 You provide the XML description for the annotation and let UIMA generate the 
annotation bean. You write Java code for the annotator and also the annotator 
XML descriptor. UIMA uses the annotator XML descriptor to instantiate and run 
your annotator. Overall, sounds really complicated but its actually quite 
simple.

The tutorial has quite a few examples that you will find useful, but in case 
you need more, I have some on this github repository:
[https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima]

The dictionary and pattern annotators may be similar to what you are looking 
for (date and city annotators).

Best regards,
Sujit

On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote:

> Hi Alex,
> 
> Indeed that is exactly what I am trying to achieve using wordcities. Date 
> will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do 
> I integrate the Java library as UIMA? The documentation about changing 
> schema.xml and solr.xml is not very detailed. 
> 
> Regards, Bart
> 
> On 8 Feb 2013, at 16:57, Alexandre Rafalovitch  wrote:
> 
>> Hi Bart,
>> 
>> I haven't done any UIMA work (I used other stuff for my NLP phase), so not
>> sure I can help much further. But in general, you are venturing into pure
>> research territory here.
>> 
>> Even for dates, what do you actually mean? Just fixed expression? Relative
>> dates (e.g. last tuesday?). What about times (7pm?).
>> 
>> Same with cities. If you want it offline, you need the gazetteer and
>> disambiguation modules. Gazetteer for cities (worldwide) is huge and has a
>> lot of duplicate names (Paris, Ontario is apparently a short drive from
>> London, Ontario eh?). Something like
>> http://www.maxmind.com/en/worldcities? And disambiguation usually
>> requires training corpus that is similar to
>> what your text will look like.
>> 
>> Online services like OpenCalais are backed by gigantic databases and some
>> serious corpus-training Machine Language disambiguation algorithms.
>> 
>> So, no plug-and-play solution here. If you really need to get this done, I
>> would recommend narrowing down the specification of exactly what you will
>> settle for and looking for software that can do it. Once you have that,
>> integration with Solr is your next - and smaller - concern.
>> 
>> Regards,
>>  Alex.
>> 
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all at
>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>> 
>> 
>> On Fri, Feb 8, 2013 at 10:41 AM, jazz  wrote:
>> 
>>> Thanks Alex,
>>> 
>>> I checked the documentation but it seems there is only a webservice
>>> (OpenCalais) available to extract dates and places.
>>> 
>>> http://uima.apache.org/sandbox.html
>>> 
>>> Do you know is there is a Solr Compatible UIMA add-on which detects dates
>>> and places (cities) without a webservice? If not, how do you write one?
>>> 
>>> Regards, Bart
>>> 
>>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:
>>> 
 Yes, it is possible. You are looking at UIMA or OpenNLP integration, most
 probably in Update Request Processor pipeline.
 
 Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
 
 You will have to put some serious work into this, it is not all tied
 together and packaged. Mostly because the Natural Language Processing
>>> (the
 field you are getting into) is kind of messy all of its own.
 
 Good luck,
  Alex.
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Fri, Feb 8, 2013 at 9:24 AM, jazz  wrote:
 
> Hi,
> 
> I want to know if Solr can analyze text and recoginze dates and places.
>>> If
> yes, is it then possible to create new dynamic fields with these dates
>>> and
> places (e.g. city).
> 
> Thanks, Bart
> 
>>> 
>>> 



RE: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)

2013-02-08 Thread Zhang, Lisheng
Thanks very much for your valuable help, it worked perfectly !!!

Lisheng

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Friday, February 08, 2013 12:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr query parser, needs to call
setAutoGeneratePhraseQueries(true)


Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" 
to all of your "text" field types in your schema.xml.

See the text_en_splitting field type for an example:


...

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky
Sent: Friday, February 08, 2013 3:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr query parser, needs to call 
setAutoGeneratePhraseQueries(true)

Simply add the "autoGeneratePhraseQueries" attribute with a value of "true"
to all of your "text" field types in your schema.xml.

See the text_




-- Jack Krupansky
-Original Message- 
From: Zhang, Lisheng
Sent: Friday, February 08, 2013 3:20 PM
To: solr-user@lucene.apache.org
Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)


Hi,

In our application we need to call method

setAutoGeneratePhraseQueries(true)

on lucene QueryParser, this is the way used to work in earlier versions
and it seems to me that is the much natural way?

But in current solr 3.6.1, the only way to do so is to set

LUCENE_30

in solrconfig.xml (if I read souce code correctly), but I donot want to
do so because this will change the whole behavior of lucene, and I only
want to change this query parser behavior, not other lucene features?

Please guide me if there is a better way other than to change solr source
code?

Thanks very much for helps, Lisheng 



Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)

2013-02-08 Thread Jack Krupansky
Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" 
to all of your "text" field types in your schema.xml.


See the text_en_splitting field type for an example:

positionIncrementGap="100" autoGeneratePhraseQueries="true">

...

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Friday, February 08, 2013 3:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr query parser, needs to call 
setAutoGeneratePhraseQueries(true)


Simply add the "autoGeneratePhraseQueries" attribute with a value of "true"
to all of your "text" field types in your schema.xml.

See the text_




-- Jack Krupansky
-Original Message- 
From: Zhang, Lisheng

Sent: Friday, February 08, 2013 3:20 PM
To: solr-user@lucene.apache.org
Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)


Hi,

In our application we need to call method

setAutoGeneratePhraseQueries(true)

on lucene QueryParser, this is the way used to work in earlier versions
and it seems to me that is the much natural way?

But in current solr 3.6.1, the only way to do so is to set

LUCENE_30

in solrconfig.xml (if I read souce code correctly), but I donot want to
do so because this will change the whole behavior of lucene, and I only
want to change this query parser behavior, not other lucene features?

Please guide me if there is a better way other than to change solr source
code?

Thanks very much for helps, Lisheng 



Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)

2013-02-08 Thread Jack Krupansky
Simply add the "autoGeneratePhraseQueries" attribute with a value of "true" 
to all of your "text" field types in your schema.xml.


See the text_




-- Jack Krupansky
-Original Message- 
From: Zhang, Lisheng

Sent: Friday, February 08, 2013 3:20 PM
To: solr-user@lucene.apache.org
Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)


Hi,

In our application we need to call method

setAutoGeneratePhraseQueries(true)

on lucene QueryParser, this is the way used to work in earlier versions
and it seems to me that is the much natural way?

But in current solr 3.6.1, the only way to do so is to set

LUCENE_30

in solrconfig.xml (if I read souce code correctly), but I donot want to
do so because this will change the whole behavior of lucene, and I only
want to change this query parser behavior, not other lucene features?

Please guide me if there is a better way other than to change solr source
code?

Thanks very much for helps, Lisheng 



Solr query parser, needs to call setAutoGeneratePhraseQueries(true)

2013-02-08 Thread Zhang, Lisheng

Hi,

In our application we need to call method

setAutoGeneratePhraseQueries(true)

on lucene QueryParser, this is the way used to work in earlier versions
and it seems to me that is the much natural way?

But in current solr 3.6.1, the only way to do so is to set 

LUCENE_30

in solrconfig.xml (if I read souce code correctly), but I donot want to 
do so because this will change the whole behavior of lucene, and I only 
want to change this query parser behavior, not other lucene features?

Please guide me if there is a better way other than to change solr source
code?

Thanks very much for helps, Lisheng


RE: Can Solr analyze content and find dates and places

2013-02-08 Thread Markus Jelsma
Bart,

For Apache Nutch we built a date extractor that relies on some regular 
expressions to extract sequences that resemble dates and pass the extracted 
candidates through a list of Java date formats together with the identified 
language (DateFormat is locale aware). With it we can extract many exotic dates 
from arbitrary text in many languages.

An older but working patch with example date formats and regular expressions 
exists for Apache Nutch. The relevant parts of the code should be easy to 
implement in your application if you're using Java.

https://issues.apache.org/jira/browse/NUTCH-1414

If you're doing multiple languages locale information is very imporant. That 
counts for an UIMA annotator as well.

Cheers,
Markus
 
 
-Original message-
> From:Bart Rijpers 
> Sent: Fri 08-Feb-2013 17:51
> To: solr-user@lucene.apache.org
> Subject: Re: Can Solr analyze content and find dates and places
> 
> Hi Alex,
> 
> Indeed that is exactly what I am trying to achieve using wordcities. Date 
> will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do 
> I integrate the Java library as UIMA? The documentation about changing 
> schema.xml and solr.xml is not very detailed. 
> 
> Regards, Bart
> 
> On 8 Feb 2013, at 16:57, Alexandre Rafalovitch  wrote:
> 
> > Hi Bart,
> > 
> > I haven't done any UIMA work (I used other stuff for my NLP phase), so not
> > sure I can help much further. But in general, you are venturing into pure
> > research territory here.
> > 
> > Even for dates, what do you actually mean? Just fixed expression? Relative
> > dates (e.g. last tuesday?). What about times (7pm?).
> > 
> > Same with cities. If you want it offline, you need the gazetteer and
> > disambiguation modules. Gazetteer for cities (worldwide) is huge and has a
> > lot of duplicate names (Paris, Ontario is apparently a short drive from
> > London, Ontario eh?). Something like
> > http://www.maxmind.com/en/worldcities? And disambiguation usually
> > requires training corpus that is similar to
> > what your text will look like.
> > 
> > Online services like OpenCalais are backed by gigantic databases and some
> > serious corpus-training Machine Language disambiguation algorithms.
> > 
> > So, no plug-and-play solution here. If you really need to get this done, I
> > would recommend narrowing down the specification of exactly what you will
> > settle for and looking for software that can do it. Once you have that,
> > integration with Solr is your next - and smaller - concern.
> > 
> > Regards,
> >   Alex.
> > 
> > Personal blog: http://blog.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> > 
> > 
> > On Fri, Feb 8, 2013 at 10:41 AM, jazz  wrote:
> > 
> >> Thanks Alex,
> >> 
> >> I checked the documentation but it seems there is only a webservice
> >> (OpenCalais) available to extract dates and places.
> >> 
> >> http://uima.apache.org/sandbox.html
> >> 
> >> Do you know is there is a Solr Compatible UIMA add-on which detects dates
> >> and places (cities) without a webservice? If not, how do you write one?
> >> 
> >> Regards, Bart
> >> 
> >> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:
> >> 
> >>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most
> >>> probably in Update Request Processor pipeline.
> >>> 
> >>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
> >>> 
> >>> You will have to put some serious work into this, it is not all tied
> >>> together and packaged. Mostly because the Natural Language Processing
> >> (the
> >>> field you are getting into) is kind of messy all of its own.
> >>> 
> >>> Good luck,
> >>>   Alex.
> >>> 
> >>> Personal blog: http://blog.outerthoughts.com/
> >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >>> - Time is the quality of nature that keeps events from happening all at
> >>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >>> 
> >>> 
> >>> On Fri, Feb 8, 2013 at 9:24 AM, jazz  wrote:
> >>> 
>  Hi,
>  
>  I want to know if Solr can analyze text and recoginze dates and places.
> >> If
>  yes, is it then possible to create new dynamic fields with these dates
> >> and
>  places (e.g. city).
>  
>  Thanks, Bart
>  
> >> 
> >> 
> 


Global .properties file for all Solr cores?

2013-02-08 Thread Hayden Muhl
I've read the documentation about how you can configure a Solr core with a
properties file. Is there any way to specify a properties file that will
apply to all cores running on a server?

Here's my scenario. I have a solr setup where I have two cores, "foo" and
"bar". I want to enable replication using properties, as is suggested on
the wiki.

http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node

I would like my master/slave settings to apply to all cores on a box, but I
would still like to have separate solrcore.properties files so that other
properties can be set per core. In other words, I would like a setup like
this, with three files.

#solr.properties
# These properties should apply to all cores on a box
enable.master=true
enable.slave=false

#foo.solrcore.properties
# These properties only apply to core foo
filterCache.size=16384

#bar.solrcore.properties
# These properties only apply to core bar
filterCache.size=2048

What I'm trying to avoid is having to duplicate the global values across
all solrcore.properties files.

I've looked into having a .properties file that applies to the whole
context, but we are running Tomcat, which does not make this easy. It seems
the only way to do this with Tomcat is with the CATALINA_OPTS environment
variable, and I would rather duplicate values across solrcore.properties
files than use CATALINA_OPTS.

- Hayden


Change client to http1.1

2013-02-08 Thread mbehlok
good day,

using protocol http client that supports http 1.1 y change the HttpBase
property useHttp11 to true but capturing packets still shows as a 1.0
request. This seems to be affecting my crawling. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Change-client-to-http1-1-tp4039279.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can Solr analyze content and find dates and places

2013-02-08 Thread Bart Rijpers
Hi Alex,

Indeed that is exactly what I am trying to achieve using wordcities. Date will 
be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do I 
integrate the Java library as UIMA? The documentation about changing schema.xml 
and solr.xml is not very detailed. 

Regards, Bart

On 8 Feb 2013, at 16:57, Alexandre Rafalovitch  wrote:

> Hi Bart,
> 
> I haven't done any UIMA work (I used other stuff for my NLP phase), so not
> sure I can help much further. But in general, you are venturing into pure
> research territory here.
> 
> Even for dates, what do you actually mean? Just fixed expression? Relative
> dates (e.g. last tuesday?). What about times (7pm?).
> 
> Same with cities. If you want it offline, you need the gazetteer and
> disambiguation modules. Gazetteer for cities (worldwide) is huge and has a
> lot of duplicate names (Paris, Ontario is apparently a short drive from
> London, Ontario eh?). Something like
> http://www.maxmind.com/en/worldcities? And disambiguation usually
> requires training corpus that is similar to
> what your text will look like.
> 
> Online services like OpenCalais are backed by gigantic databases and some
> serious corpus-training Machine Language disambiguation algorithms.
> 
> So, no plug-and-play solution here. If you really need to get this done, I
> would recommend narrowing down the specification of exactly what you will
> settle for and looking for software that can do it. Once you have that,
> integration with Solr is your next - and smaller - concern.
> 
> Regards,
>   Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> 
> 
> On Fri, Feb 8, 2013 at 10:41 AM, jazz  wrote:
> 
>> Thanks Alex,
>> 
>> I checked the documentation but it seems there is only a webservice
>> (OpenCalais) available to extract dates and places.
>> 
>> http://uima.apache.org/sandbox.html
>> 
>> Do you know is there is a Solr Compatible UIMA add-on which detects dates
>> and places (cities) without a webservice? If not, how do you write one?
>> 
>> Regards, Bart
>> 
>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:
>> 
>>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most
>>> probably in Update Request Processor pipeline.
>>> 
>>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
>>> 
>>> You will have to put some serious work into this, it is not all tied
>>> together and packaged. Mostly because the Natural Language Processing
>> (the
>>> field you are getting into) is kind of messy all of its own.
>>> 
>>> Good luck,
>>>   Alex.
>>> 
>>> Personal blog: http://blog.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all at
>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>> 
>>> 
>>> On Fri, Feb 8, 2013 at 9:24 AM, jazz  wrote:
>>> 
 Hi,
 
 I want to know if Solr can analyze text and recoginze dates and places.
>> If
 yes, is it then possible to create new dynamic fields with these dates
>> and
 places (e.g. city).
 
 Thanks, Bart
 
>> 
>> 


Re: Can Solr analyze content and find dates and places

2013-02-08 Thread Alexandre Rafalovitch
Hi Bart,

I haven't done any UIMA work (I used other stuff for my NLP phase), so not
sure I can help much further. But in general, you are venturing into pure
research territory here.

Even for dates, what do you actually mean? Just fixed expression? Relative
dates (e.g. last tuesday?). What about times (7pm?).

Same with cities. If you want it offline, you need the gazetteer and
disambiguation modules. Gazetteer for cities (worldwide) is huge and has a
lot of duplicate names (Paris, Ontario is apparently a short drive from
London, Ontario eh?). Something like
http://www.maxmind.com/en/worldcities? And disambiguation usually
requires training corpus that is similar to
what your text will look like.

Online services like OpenCalais are backed by gigantic databases and some
serious corpus-training Machine Language disambiguation algorithms.

So, no plug-and-play solution here. If you really need to get this done, I
would recommend narrowing down the specification of exactly what you will
settle for and looking for software that can do it. Once you have that,
integration with Solr is your next - and smaller - concern.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Feb 8, 2013 at 10:41 AM, jazz  wrote:

> Thanks Alex,
>
> I checked the documentation but it seems there is only a webservice
> (OpenCalais) available to extract dates and places.
>
> http://uima.apache.org/sandbox.html
>
> Do you know is there is a Solr Compatible UIMA add-on which detects dates
> and places (cities) without a webservice? If not, how do you write one?
>
> Regards, Bart
>
> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:
>
> > Yes, it is possible. You are looking at UIMA or OpenNLP integration, most
> > probably in Update Request Processor pipeline.
> >
> > Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
> >
> > You will have to put some serious work into this, it is not all tied
> > together and packaged. Mostly because the Natural Language Processing
> (the
> > field you are getting into) is kind of messy all of its own.
> >
> > Good luck,
> >Alex.
> >
> > Personal blog: http://blog.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Fri, Feb 8, 2013 at 9:24 AM, jazz  wrote:
> >
> >> Hi,
> >>
> >> I want to know if Solr can analyze text and recoginze dates and places.
> If
> >> yes, is it then possible to create new dynamic fields with these dates
> and
> >> places (e.g. city).
> >>
> >> Thanks, Bart
> >>
>
>


Re: Can Solr analyze content and find dates and places

2013-02-08 Thread jazz
Thanks Alex,

I checked the documentation but it seems there is only a webservice 
(OpenCalais) available to extract dates and places.

http://uima.apache.org/sandbox.html

Do you know is there is a Solr Compatible UIMA add-on which detects dates and 
places (cities) without a webservice? If not, how do you write one?

Regards, Bart

On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:

> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most
> probably in Update Request Processor pipeline.
> 
> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
> 
> You will have to put some serious work into this, it is not all tied
> together and packaged. Mostly because the Natural Language Processing (the
> field you are getting into) is kind of messy all of its own.
> 
> Good luck,
>Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> 
> 
> On Fri, Feb 8, 2013 at 9:24 AM, jazz  wrote:
> 
>> Hi,
>> 
>> I want to know if Solr can analyze text and recoginze dates and places. If
>> yes, is it then possible to create new dynamic fields with these dates and
>> places (e.g. city).
>> 
>> Thanks, Bart
>> 



copy Field / postprocess Fields after analyze / dynamic analyzer config

2013-02-08 Thread Kai Gülzau
I there a way to postprocess a field after analyze?

Saying postprocess I think of renaming, moving or appending fields.


Some more information:

My schema.xml contains several language suffixed fields (nouns_de, ...).
Each of these is analyzed in a language dependent way:


  


  
  

  


When I do a facted search I have to include every field_lang combination since 
I do not know the language at query time:

http://localhost:8983/solr/master/select?q=*:*&rows=0&facet=true&facet.field=nouns_de&facet.field=nouns_en&facet.field=nouns_fr&facet.field=nouns_nl
 ...

So I have to merge all terms in my own business logic :-(


Any idea / pointer to rename fields after analyze?

This post says it's not possible with the current API:
http://lucene.472066.n3.nabble.com/copyField-after-analyzer-td3900337.html


Another approach would be to allow analyzer configuration depending on another 
field value (language).


regards,

Kai Gülzau



Re: Trying to understand soft vs hard commit vs transaction log

2013-02-08 Thread Shawn Heisey

On 2/8/2013 3:12 AM, Isaac Hebsh wrote:

Shawn, what about 'flush to disk' behaviour on MMapDirectoryFactory?


MMapDirectoryFactory should flush everything to disk on a hard commit 
and not keep anything in RAM.  I *think* that soft commits still end up 
in RAM with this implementation, but you'll want to wait for someone who 
actually knows to confirm or deny that.


Just FYI, NRTCachingDirectoryFactory is a wrapper class - implementing 
caching functionality and using MMapDirectoryFactory to actually contact 
the disk.


If indexing and/or startup performance concerns have led you to turn off 
the updateLog, MMapDirectoryFactory is the correct implementation to 
use.  Using the NRT default without the updateLog will lead to data loss 
if anything crashes.


Thanks,
Shawn



Re: Can Solr analyze content and find dates and places

2013-02-08 Thread Alexandre Rafalovitch
Yes, it is possible. You are looking at UIMA or OpenNLP integration, most
probably in Update Request Processor pipeline.

Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA

You will have to put some serious work into this, it is not all tied
together and packaged. Mostly because the Natural Language Processing (the
field you are getting into) is kind of messy all of its own.

Good luck,
Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Feb 8, 2013 at 9:24 AM, jazz  wrote:

> Hi,
>
> I want to know if Solr can analyze text and recoginze dates and places. If
> yes, is it then possible to create new dynamic fields with these dates and
> places (e.g. city).
>
> Thanks, Bart
>


RE: which analyzer is used for facet.query?

2013-02-08 Thread Kai Gülzau
> So it seems that facet.query is using the analyzer of type index.
> Is it a bug or is there another analyzer type for the facet query?

Nobody?
Should I file a bug?

Kai

-Original Message-
From: Kai Gülzau [mailto:kguel...@novomind.com] 
Sent: Tuesday, February 05, 2013 2:31 PM
To: solr-user@lucene.apache.org
Subject: which analyzer is used for facet.query?

Hi all,

which analyzer is used for the facet.query?


This is my schema.xml:


  


  
  

  

...



When doing a faceting search like:

http://localhost:8983/solr/slave/select?q=*:*&fq=type:7&rows=0&wt=json&indent=true&facet=true&facet.query=albody_de:Klaus

The UIMA whitespace tokenizer logs some infos:
Feb 05, 2013 2:23:06 PM WhitespaceTokenizer process Information: "Whitespace 
tokenizer starts processing"
Feb 05, 2013 2:23:06 PM WhitespaceTokenizer process Information: "Whitespace 
tokenizer finished processing"


So it seems that facet.query is using the analyzer of type index.
Is it a bug or is there another analyzer type for the facet query?

Regards,

Kai Gülzau





Can Solr analyze content and find dates and places

2013-02-08 Thread jazz
Hi,

I want to know if Solr can analyze text and recoginze dates and places. If yes, 
is it then possible to create new dynamic fields with these dates and places 
(e.g. city).

Thanks, Bart


Re: solr file based spell suggestions

2013-02-08 Thread Rohan Thakur
hi

thanks I configured that using synonym mapping its now giving sII results
on searching for s2.

thanks
regards
Rohan
On Thu, Feb 7, 2013 at 7:15 PM, Jack Krupansky wrote:

> Changing "x" to "y" (e.g., "s2" to "sII") is not a function of "spell
> check" or "suggestion".
>
> Synonyms are a closer match, but can be difficult to configure properly.
> Good luck.
>
> You may be better off preprocessing the query at the application level and
> then generating the appropriate boolean logic, such as: "(s2 OR sII)".
>
> -- Jack Krupansky
>
> -Original Message- From: Rohan Thakur
> Sent: Thursday, February 07, 2013 8:24 AM
> To: solr-user@lucene.apache.org
> Subject: solr file based spell suggestions
>
>
> hi all
>
> I wanted to know how can I apply file based dictionary for spell
> suggestions such that if I search for s2 in the query it would take it as
> sII which also represent same thing in my indexed field...but as in search
> it can also be interpreted as s2 please help anyone...
>
> thanks
> regards
> Rohan
>


ExtractingRequestHandler literals

2013-02-08 Thread marotosg
Hi,

I am trying to index some documents using ExtractingRequestHandler and tika.
Solr 3.6
I would like to add some extra data coming from a different source using
literal.

My schema contains these fields
 
 
 

My url
http://dzoagent001:8080/solr/document/update/extract?commit=true&stream.file=//DZOAGENT001/ShareFolder/file.txt&literal.DocumentID=125

IT looks like literals are not working properly . Any idea?
*Error* 

SEVERE: org.apache.solr.common.SolrException: [doc=null] missing required
field: DocumentID
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:355)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:141)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:146)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:236)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:865)
at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)
at java.lang.Thread.run(Thread.java:679)

Thanks







--
View this message in context: 
http://lucene.472066.n3.nabble.com/ExtractingRequestHandler-literals-tp4039222.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud new zookeper node on different ip/ replicate between two clasters

2013-02-08 Thread mizayah
I dont think its so simple.

First I need to have at last 3 zoo to keep failver for one server.
Second after one zoo die, i need restart of all solrs.

Maybe i define simply question.
Two data centers.
How to replicate two solr claster between two datacenters?
In no SolrClaud there is repeater, if i connect all SolrCloud nodes in one
claster between dc i will make lots of trafiick between them.
I dont mention about that i will get leader elected in wrong datacanter
eventually.

How can i have two claster of solr and replicate them between two
datacenters?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-new-zookeper-node-on-different-ip-replicate-between-two-clasters-tp4039101p4039217.html
Sent from the Solr - User mailing list archive at Nabble.com.


unable to get the same results for ab and a b due to whitespace in solr

2013-02-08 Thread soumya vinukonda
Hi All,

I have a synonyms.txt file where ab should give same result as a b . but
when i search ab it gives 4 results and a b gives 104 results. 

I tried giving a+b but i donno how to give the + throug schema.xml . Please
help.

I tried giving expand= false and expand =false when doing indexing  but it
is the same.

i have gone through the link
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46

but it didnt help me



--
View this message in context: 
http://lucene.472066.n3.nabble.com/unable-to-get-the-same-results-for-ab-and-a-b-due-to-whitespace-in-solr-tp4039212.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Trying to understand soft vs hard commit vs transaction log

2013-02-08 Thread Isaac Hebsh
Shawn, what about 'flush to disk' behaviour on MMapDirectoryFactory?


On Fri, Feb 8, 2013 at 11:12 AM, Prakhar Birla wrote:

> Great explanation Shawn! BTW soft commited documents will be not be
> recovered on JVM crash.
>
> On 8 February 2013 13:27, Shawn Heisey  wrote:
>
> > On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote:
> >
> >> Hello,
> >>
> >> What actually happens when using soft (as opposed to hard) commit?
> >>
> >> I understand somewhat very high-level picture (documents become
> available
> >> faster, but you may loose them on power loss).
> >> I don't care about low-level implementation details.
> >>
> >> But I am trying to understand what is happening on the medium level of
> >> details.
> >>
> >> For example what are stages of a document if we are using all available
> >> transaction log, soft commit, hard commit options? It feels like there
> is
> >> three stages:
> >> *) Uncommitted (soft or hard): accessible only via direct real-time get?
> >> *) Soft-committed: accessible through all search operatons? (but not on
> >> disk? but where is it? in memory?)
> >> *) Hard-committed: all the same as soft-committed but it is now on disk
> >>
> >> Similarly,  in performance section of Wiki, it says: "A commit
> (including
> >> a
> >> soft commit) will free up almost all heap memory" - why would soft
> commit
> >> free up heap memory? I thought it was not flushed to disk.
> >>
> >> Also, with soft-commits and transaction log enabled, doesn't transaction
> >> log allows to replay/recover the latest state after crash? I believe
> >> that's
> >> what transaction log does for the database. If not, how does one
> recover,
> >> if at all?
> >>
> >> And where does openSearcher=false fits into that? Does it cause
> >> inconsistent results somehow?
> >>
> >> I am missing something, but I am not sure what or where. Any points in
> the
> >> right direction would be appreciated.
> >>
> >
> > Let's see if I can answer your questions without giving you incorrect
> > information.
> >
> > New indexed content is not searchable until you open a new searcher,
> > regardless of the type of commit that you do.
> >
> > A hard commit will close the current transaction log and start a new one.
> >  It will also instruct the Directory implementation to flush to disk.  If
> > you specify openSearcher=false, then the content that has just been
> > committed will NOT be searchable, as discussed in the previous paragraph.
> >  The existing searcher will remain open and continue to serve queries
> > against the same index data.
> >
> > A soft commit does not flush the new content to disk, but it does open a
> > new searcher.  I'm sure that the amount of memory available for caching
> > this content is not large, so it's possible that if you do a lot of
> > indexing with soft commits and your hard commits are too infrequent,
> you'll
> > end up flushing part of the cached data to disk anyway.  I'd love to hear
> > from a committer about this, because I could be wrong.
> >
> > There's a caveat with that 'flush to disk' operation -- the default
> > Directory implementation in the Solr example config, which is
> > NRTCachingDirectoryFactory, will cache the last few megabytes of indexed
> > data and not flush it to disk even with a hard commit.  If your commits
> are
> > small, then the net result is similar to a soft commit.  If the server or
> > Solr were to crash, the transaction logs would be replayed on Solr
> startup,
> > recovering that last few megabytes.  The transaction log may also recover
> > documents that were soft committed, but I'm not 100% sure about that.
> >
> > To take full advantage of NRT functionality, you can commit as often as
> > you like with soft commits.  On some reasonable interval, say every one
> to
> > fifteen minutes, you can issue a hard commit with openSearcher set to
> > false, to flush things to disk and cycle through transaction logs before
> > they get huge.  Solr will keep a few of the transaction logs around, and
> if
> > they are huge, it can take a long time to replay them.  You'll want to
> > choose a hard commit interval that doesn't create giant transaction logs.
> >
> > If any of the info I've given here is wrong, someone should correct me!
> >
> > Thanks,
> > Shawn
> >
> >
>
>
> --
> Regards,
> Prakhar Birla
> +91 9739868086
>


Re: how-to configure mysql pool connection on Solr Server

2013-02-08 Thread Miguel

Thanks for help

It's a good idea configure datasource pool on Jetty or Tomcat and 
after reuse on my custom plugin.
In this page of Jetty: 
http://docs.codehaus.org/display/JETTY/DataSource+Examples

explain how-to configure differents datasources.

thanks again.

El 07/02/2013 17:35, Michael Della Bitta escribió:

Hello Miguel,

If you set up a JNDI datasource in your servlet container, you can use
that as your database config. Then you just need to use a pooling
datasource:

http://wiki.apache.org/solr/DataImportHandlerFaq#How_do_I_use_a_JNDI_DataSource.3F
http://dev.mysql.com/tech-resources/articles/connection_pooling_with_connectorj.html


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Feb 7, 2013 at 7:20 AM, Miguel
 wrote:

Hi

  I need configure a mysql pool connection on Solr Server for using on custom
plugin. I saw DataImportHandler wiki:
http://wiki.apache.org/solr/DataImportHandler , but it's seems that
DataImportHandler open the  connection when handler is calling and close
when finish import and I need keep opening pool to reuse connections
whenever I need them.

I not found on documentation of Apache Solr how-to define a pools connection
to DB for reusing them on whatever class of solr.
Any ideas?

thanks





Re: Trying to understand soft vs hard commit vs transaction log

2013-02-08 Thread Prakhar Birla
Great explanation Shawn! BTW soft commited documents will be not be
recovered on JVM crash.

On 8 February 2013 13:27, Shawn Heisey  wrote:

> On 2/7/2013 9:29 PM, Alexandre Rafalovitch wrote:
>
>> Hello,
>>
>> What actually happens when using soft (as opposed to hard) commit?
>>
>> I understand somewhat very high-level picture (documents become available
>> faster, but you may loose them on power loss).
>> I don't care about low-level implementation details.
>>
>> But I am trying to understand what is happening on the medium level of
>> details.
>>
>> For example what are stages of a document if we are using all available
>> transaction log, soft commit, hard commit options? It feels like there is
>> three stages:
>> *) Uncommitted (soft or hard): accessible only via direct real-time get?
>> *) Soft-committed: accessible through all search operatons? (but not on
>> disk? but where is it? in memory?)
>> *) Hard-committed: all the same as soft-committed but it is now on disk
>>
>> Similarly,  in performance section of Wiki, it says: "A commit (including
>> a
>> soft commit) will free up almost all heap memory" - why would soft commit
>> free up heap memory? I thought it was not flushed to disk.
>>
>> Also, with soft-commits and transaction log enabled, doesn't transaction
>> log allows to replay/recover the latest state after crash? I believe
>> that's
>> what transaction log does for the database. If not, how does one recover,
>> if at all?
>>
>> And where does openSearcher=false fits into that? Does it cause
>> inconsistent results somehow?
>>
>> I am missing something, but I am not sure what or where. Any points in the
>> right direction would be appreciated.
>>
>
> Let's see if I can answer your questions without giving you incorrect
> information.
>
> New indexed content is not searchable until you open a new searcher,
> regardless of the type of commit that you do.
>
> A hard commit will close the current transaction log and start a new one.
>  It will also instruct the Directory implementation to flush to disk.  If
> you specify openSearcher=false, then the content that has just been
> committed will NOT be searchable, as discussed in the previous paragraph.
>  The existing searcher will remain open and continue to serve queries
> against the same index data.
>
> A soft commit does not flush the new content to disk, but it does open a
> new searcher.  I'm sure that the amount of memory available for caching
> this content is not large, so it's possible that if you do a lot of
> indexing with soft commits and your hard commits are too infrequent, you'll
> end up flushing part of the cached data to disk anyway.  I'd love to hear
> from a committer about this, because I could be wrong.
>
> There's a caveat with that 'flush to disk' operation -- the default
> Directory implementation in the Solr example config, which is
> NRTCachingDirectoryFactory, will cache the last few megabytes of indexed
> data and not flush it to disk even with a hard commit.  If your commits are
> small, then the net result is similar to a soft commit.  If the server or
> Solr were to crash, the transaction logs would be replayed on Solr startup,
> recovering that last few megabytes.  The transaction log may also recover
> documents that were soft committed, but I'm not 100% sure about that.
>
> To take full advantage of NRT functionality, you can commit as often as
> you like with soft commits.  On some reasonable interval, say every one to
> fifteen minutes, you can issue a hard commit with openSearcher set to
> false, to flush things to disk and cycle through transaction logs before
> they get huge.  Solr will keep a few of the transaction logs around, and if
> they are huge, it can take a long time to replay them.  You'll want to
> choose a hard commit interval that doesn't create giant transaction logs.
>
> If any of the info I've given here is wrong, someone should correct me!
>
> Thanks,
> Shawn
>
>


-- 
Regards,
Prakhar Birla
+91 9739868086


Re: Updating data

2013-02-08 Thread anurag.jain
i have question 

what if id not exits in previous data ? ?

like 
[
  {
  "id":"6",
   "is_good":{"add":"1"}
   }
] 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-data-tp4038492p4039190.html
Sent from the Solr - User mailing list archive at Nabble.com.