Re: Regarding DrillDown search

2014-02-10 Thread Jebarlin Robertson
Dear Shai,

Thank you for the quick response :)

I have checked with PrefixQuery and term, it is working fine, But I think I
cannot pass multiple Category path in it. I am calling the
DrillDown.term() method 'N' number of times based on the number of Category
Path list.

And I have one more question, When I get the FacetResult, I am getting only
the count of documents matched with the Category Path.
Is there anyway to get the Document object also along with the count to
know the file names For ex. Files (file names -title Field in Document)
which have the same Author from the FacetResult. I have read some articles
for the same from one of your answer I believe.
In that you have explained like this "Categories will be listed to the user
and when the user clicks the category we have to do DrillDown search to get
further result.
I just want to know if we can get the document names as well in the first
Facet query search itself, when we get the count (no of hits) of documents
along with the FacetResult. Is there any solution available already or what
I can do for that.

Kindly Guide me :)

Thank you for All your Support.

Regards,
Jebarlin.R


On Mon, Feb 10, 2014 at 1:28 PM, Shai Erera  wrote:

> Hi
>
> If you want to drill-down on first name only, then you have several
> options:
>
> 1) Index Author/First, Author/Last, Author/First_Last as facets on the
> document. This is the faster approach, but bloats the index. Also, if you
> index the author Author/Jebarlin, Author/Robertson and
> Author/Jebarlin_Robertson, it still won't allow you to execute a query
> Author/Jebar.
>
> 2) You should modify the query to be a PrefixQuery, as if the user chose to
> search Author/Jeral*. You can do that with DrillDown.term() to create a
> Term($facets, Author/Jeral) (NOTE: you shouldn't pass '*' as part of the
> CategoryPath) and then construct your own PrefixQuery with that Term.
>
> Hope that helps,
> Shai
>
>
> On Mon, Feb 10, 2014 at 6:21 AM, Jebarlin Robertson  >wrote:
>
> > Dear Shai,
> >
> > I have one doubt in DrillDown search, when I search with a CategoryPath
> of
> > author, it is giving me the result if I give the accurate full name only.
> > Is there any way to get the result even if I give the first or last name.
> > Can you help me to search like (*contains* the word in Facet search), if
> > the latest API supports or any other APIs.
> >
> > Thank You
> >
> > --
> > Thanks & Regards,
> > Jebarlin Robertson.R
> > GSM: 91-9538106181.
> >
>



-- 
Thanks & Regards,
Jebarlin Robertson.R
GSM: 91-9538106181.


Re: Regarding DrillDown search

2014-02-10 Thread Shai Erera
Hi

You will need to build a BooleanQuery which comprises a list of
PrefixQuery. The relation between each PrefixQuery should be OR or AND, as
you see fit (I believe OR?).

In order to get documents' attributes you should execute searcher.search()
w/ e.g. MultiCollector which wraps a FacetsCollector and
TopScoreDocCollector. Then after .search() finished, you should pull the
facet results from the FacetsCollector instance and the document results
from the TopScoreDocCollector instance. Something like (I hope it compiles
in 3.6! :)):

TopScoreDocCollector tsdc = TopScoreDocCollector.create(...);
FacetsCollector fc = FacetsCollector.create(...);
searcher.search(query, MultiCollector.wrap(tsdc, fc));

List facetResults = fc.getFacetResults();
TopDocs topDocs = tsdc.topDocs();

Something like that..

Shai


On Mon, Feb 10, 2014 at 1:57 PM, Jebarlin Robertson wrote:

> Dear Shai,
>
> Thank you for the quick response :)
>
> I have checked with PrefixQuery and term, it is working fine, But I think I
> cannot pass multiple Category path in it. I am calling the
> DrillDown.term() method 'N' number of times based on the number of Category
> Path list.
>
> And I have one more question, When I get the FacetResult, I am getting only
> the count of documents matched with the Category Path.
> Is there anyway to get the Document object also along with the count to
> know the file names For ex. Files (file names -title Field in Document)
> which have the same Author from the FacetResult. I have read some articles
> for the same from one of your answer I believe.
> In that you have explained like this "Categories will be listed to the user
> and when the user clicks the category we have to do DrillDown search to get
> further result.
> I just want to know if we can get the document names as well in the first
> Facet query search itself, when we get the count (no of hits) of documents
> along with the FacetResult. Is there any solution available already or what
> I can do for that.
>
> Kindly Guide me :)
>
> Thank you for All your Support.
>
> Regards,
> Jebarlin.R
>
>
> On Mon, Feb 10, 2014 at 1:28 PM, Shai Erera  wrote:
>
> > Hi
> >
> > If you want to drill-down on first name only, then you have several
> > options:
> >
> > 1) Index Author/First, Author/Last, Author/First_Last as facets on the
> > document. This is the faster approach, but bloats the index. Also, if you
> > index the author Author/Jebarlin, Author/Robertson and
> > Author/Jebarlin_Robertson, it still won't allow you to execute a query
> > Author/Jebar.
> >
> > 2) You should modify the query to be a PrefixQuery, as if the user chose
> to
> > search Author/Jeral*. You can do that with DrillDown.term() to create a
> > Term($facets, Author/Jeral) (NOTE: you shouldn't pass '*' as part of the
> > CategoryPath) and then construct your own PrefixQuery with that Term.
> >
> > Hope that helps,
> > Shai
> >
> >
> > On Mon, Feb 10, 2014 at 6:21 AM, Jebarlin Robertson  > >wrote:
> >
> > > Dear Shai,
> > >
> > > I have one doubt in DrillDown search, when I search with a CategoryPath
> > of
> > > author, it is giving me the result if I give the accurate full name
> only.
> > > Is there any way to get the result even if I give the first or last
> name.
> > > Can you help me to search like (*contains* the word in Facet search),
> if
> > > the latest API supports or any other APIs.
> > >
> > > Thank You
> > >
> > > --
> > > Thanks & Regards,
> > > Jebarlin Robertson.R
> > > GSM: 91-9538106181.
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Jebarlin Robertson.R
> GSM: 91-9538106181.
>


Re: Regarding DrillDown search

2014-02-10 Thread Jebarlin Robertson
Hi Shai,

Thanks,

I am using the same way of BooleanQuery only with list of PrefixQuery only.
I think I confused you sorry :) .

I am using the same above code to get the result of documents. I am getting
the TopDocs and retrieving the Documents also, If I don't even try that for
the basic you will kill me :D.
But my question was different, from the List of FacetResult I am getting
only the counts or no of hits of Document in each category after iterating
the list.
I believe that the getLevel() of FacetNode returns the no of hits or no of
documents falls into the particular Category.
I need to know which are the documents are falling under the same category
from the FacetResult Object also.

I hope you will understand my question :)

Thank you :)

--
Jebarlin



On Mon, Feb 10, 2014 at 9:09 PM, Shai Erera  wrote:

> Hi
>
> You will need to build a BooleanQuery which comprises a list of
> PrefixQuery. The relation between each PrefixQuery should be OR or AND, as
> you see fit (I believe OR?).
>
> In order to get documents' attributes you should execute searcher.search()
> w/ e.g. MultiCollector which wraps a FacetsCollector and
> TopScoreDocCollector. Then after .search() finished, you should pull the
> facet results from the FacetsCollector instance and the document results
> from the TopScoreDocCollector instance. Something like (I hope it compiles
> in 3.6! :)):
>
> TopScoreDocCollector tsdc = TopScoreDocCollector.create(...);
> FacetsCollector fc = FacetsCollector.create(...);
> searcher.search(query, MultiCollector.wrap(tsdc, fc));
>
> List facetResults = fc.getFacetResults();
> TopDocs topDocs = tsdc.topDocs();
>
> Something like that..
>
> Shai
>
>
> On Mon, Feb 10, 2014 at 1:57 PM, Jebarlin Robertson  >wrote:
>
> > Dear Shai,
> >
> > Thank you for the quick response :)
> >
> > I have checked with PrefixQuery and term, it is working fine, But I
> think I
> > cannot pass multiple Category path in it. I am calling the
> > DrillDown.term() method 'N' number of times based on the number of
> Category
> > Path list.
> >
> > And I have one more question, When I get the FacetResult, I am getting
> only
> > the count of documents matched with the Category Path.
> > Is there anyway to get the Document object also along with the count to
> > know the file names For ex. Files (file names -title Field in Document)
> > which have the same Author from the FacetResult. I have read some
> articles
> > for the same from one of your answer I believe.
> > In that you have explained like this "Categories will be listed to the
> user
> > and when the user clicks the category we have to do DrillDown search to
> get
> > further result.
> > I just want to know if we can get the document names as well in the first
> > Facet query search itself, when we get the count (no of hits) of
> documents
> > along with the FacetResult. Is there any solution available already or
> what
> > I can do for that.
> >
> > Kindly Guide me :)
> >
> > Thank you for All your Support.
> >
> > Regards,
> > Jebarlin.R
> >
> >
> > On Mon, Feb 10, 2014 at 1:28 PM, Shai Erera  wrote:
> >
> > > Hi
> > >
> > > If you want to drill-down on first name only, then you have several
> > > options:
> > >
> > > 1) Index Author/First, Author/Last, Author/First_Last as facets on the
> > > document. This is the faster approach, but bloats the index. Also, if
> you
> > > index the author Author/Jebarlin, Author/Robertson and
> > > Author/Jebarlin_Robertson, it still won't allow you to execute a query
> > > Author/Jebar.
> > >
> > > 2) You should modify the query to be a PrefixQuery, as if the user
> chose
> > to
> > > search Author/Jeral*. You can do that with DrillDown.term() to create a
> > > Term($facets, Author/Jeral) (NOTE: you shouldn't pass '*' as part of
> the
> > > CategoryPath) and then construct your own PrefixQuery with that Term.
> > >
> > > Hope that helps,
> > > Shai
> > >
> > >
> > > On Mon, Feb 10, 2014 at 6:21 AM, Jebarlin Robertson <
> jebar...@gmail.com
> > > >wrote:
> > >
> > > > Dear Shai,
> > > >
> > > > I have one doubt in DrillDown search, when I search with a
> CategoryPath
> > > of
> > > > author, it is giving me the result if I give the accurate full name
> > only.
> > > > Is there any way to get the result even if I give the first or last
> > name.
> > > > Can you help me to search like (*contains* the word in Facet search),
> > if
> > > > the latest API supports or any other APIs.
> > > >
> > > > Thank You
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Jebarlin Robertson.R
> > > > GSM: 91-9538106181.
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Jebarlin Robertson.R
> > GSM: 91-9538106181.
> >
>



-- 
Thanks & Regards,
Jebarlin Robertson.R
GSM: 91-9538106181.


Re: Regarding DrillDown search

2014-02-10 Thread Shai Erera
Ahh I see ... so given a single FacetResultNode, you would like to know
which documents contributed to its weight (count in your case). This is not
available immediately, that's why you need to do a drill-down query. So if
you return the user a list of categories, when he clicks one of them, you
perform a drill-down query on that category and retrieve all the associated
documents.

May I ask why do you need to know the list of documents given a
FacetResultNode?

Basically in the 3.6 API it's kind of not so simple to do what you want in
one-pass, but in the 4.x API (especially the upcoming 4.7) it should be
very easy -- when you traverse the list of matching documents, besides only
reading the list of categories associated with it, you also store a map
Category -> List. This isn't very cheap though ...

So I guess it would be good if I understand why do you need to know which
documents contributed to which category, before the results are returned to
the user.

Shai


On Mon, Feb 10, 2014 at 3:16 PM, Jebarlin Robertson wrote:

> Hi Shai,
>
> Thanks,
>
> I am using the same way of BooleanQuery only with list of PrefixQuery only.
> I think I confused you sorry :) .
>
> I am using the same above code to get the result of documents. I am getting
> the TopDocs and retrieving the Documents also, If I don't even try that for
> the basic you will kill me :D.
> But my question was different, from the List of FacetResult I am getting
> only the counts or no of hits of Document in each category after iterating
> the list.
> I believe that the getLevel() of FacetNode returns the no of hits or no of
> documents falls into the particular Category.
> I need to know which are the documents are falling under the same category
> from the FacetResult Object also.
>
> I hope you will understand my question :)
>
> Thank you :)
>
> --
> Jebarlin
>
>
>
> On Mon, Feb 10, 2014 at 9:09 PM, Shai Erera  wrote:
>
> > Hi
> >
> > You will need to build a BooleanQuery which comprises a list of
> > PrefixQuery. The relation between each PrefixQuery should be OR or AND,
> as
> > you see fit (I believe OR?).
> >
> > In order to get documents' attributes you should execute
> searcher.search()
> > w/ e.g. MultiCollector which wraps a FacetsCollector and
> > TopScoreDocCollector. Then after .search() finished, you should pull the
> > facet results from the FacetsCollector instance and the document results
> > from the TopScoreDocCollector instance. Something like (I hope it
> compiles
> > in 3.6! :)):
> >
> > TopScoreDocCollector tsdc = TopScoreDocCollector.create(...);
> > FacetsCollector fc = FacetsCollector.create(...);
> > searcher.search(query, MultiCollector.wrap(tsdc, fc));
> >
> > List facetResults = fc.getFacetResults();
> > TopDocs topDocs = tsdc.topDocs();
> >
> > Something like that..
> >
> > Shai
> >
> >
> > On Mon, Feb 10, 2014 at 1:57 PM, Jebarlin Robertson  > >wrote:
> >
> > > Dear Shai,
> > >
> > > Thank you for the quick response :)
> > >
> > > I have checked with PrefixQuery and term, it is working fine, But I
> > think I
> > > cannot pass multiple Category path in it. I am calling the
> > > DrillDown.term() method 'N' number of times based on the number of
> > Category
> > > Path list.
> > >
> > > And I have one more question, When I get the FacetResult, I am getting
> > only
> > > the count of documents matched with the Category Path.
> > > Is there anyway to get the Document object also along with the count to
> > > know the file names For ex. Files (file names -title Field in Document)
> > > which have the same Author from the FacetResult. I have read some
> > articles
> > > for the same from one of your answer I believe.
> > > In that you have explained like this "Categories will be listed to the
> > user
> > > and when the user clicks the category we have to do DrillDown search to
> > get
> > > further result.
> > > I just want to know if we can get the document names as well in the
> first
> > > Facet query search itself, when we get the count (no of hits) of
> > documents
> > > along with the FacetResult. Is there any solution available already or
> > what
> > > I can do for that.
> > >
> > > Kindly Guide me :)
> > >
> > > Thank you for All your Support.
> > >
> > > Regards,
> > > Jebarlin.R
> > >
> > >
> > > On Mon, Feb 10, 2014 at 1:28 PM, Shai Erera  wrote:
> > >
> > > > Hi
> > > >
> > > > If you want to drill-down on first name only, then you have several
> > > > options:
> > > >
> > > > 1) Index Author/First, Author/Last, Author/First_Last as facets on
> the
> > > > document. This is the faster approach, but bloats the index. Also, if
> > you
> > > > index the author Author/Jebarlin, Author/Robertson and
> > > > Author/Jebarlin_Robertson, it still won't allow you to execute a
> query
> > > > Author/Jebar.
> > > >
> > > > 2) You should modify the query to be a PrefixQuery, as if the user
> > chose
> > > to
> > > > search Author/Jeral*. You can do that with DrillDown.term() to
> create a
> > > > Term($facets, Autho

Faceting and Query-time Joins

2014-02-10 Thread Jon Stewart
Hello,

tl;dr: I'd like to know how to do faceting over the result set of a
query-time join (JoinUtils). If it's not currently supported by
Lucene, I'd appreciate some pointers about what needs to be done.

I'm working on a greenfields project with Lucene 4.6. The application
treats its primary objects as a collection of child records. The child
records are of different types and, unfortunately, are not available
all at once (ruling out BlockJoinQuery). As the child records roll
into the system for indexing, they're represented as Lucene Document
objects that have the primary key of the parent object as a field. The
child records themselves never change, so there's no need for
re-indexing. I can use query-time joins on the parent ID field. So
far, so good.

The problem is that I also very much want to have faceting pertaining
to the parent objects. Googling around the past couple days hasn't
revealed much discussion of how to combine facets with query-time
joins (except "nope": http://search-lucene.com/m/QTPadBcnv1). Is it
possible to combine these two features with the above constraints? If
so, how? If not in Lucene 4.6, is there related work in trunk? One
thing I was thinking about last night is that it wouldn't seem to be
too hard to do the faceting for this case by using update-able
NumericDocValue on a dummy parent object, since that shouldn't require
re-indexing.

TIA,

Jon
-- 
Jon Stewart, Principal
(646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Regarding DrillDown search

2014-02-10 Thread Jebarlin Robertson
Hi Shai,

Thanks for the explanation :)

For my requirement, I just want to display the list of resulted documents
to the user.
In Facet search case also, I already have the resulted documents list in
TopDoc and the FacetResults have only the count of documents contributed to
each Catagory,

According to my understanding,

Suppose I query for the word "Love", Now I do Facet Search and gets 4
(Files) documents as matched results from TopScoreDocCollector as TopDocs
and I will get the FacetResult from the FacetCollector.
And the FacetResultsNode gives me only the values of the category and the
count of how many documents falls under same category (May be by Author or
other provided categories ) among the 4 resulted documents only.

I feel, It will be good if I get the category association with the resulted
documents, as I have the document list already from TopScoreDocCollector.

I can do DrillDown Search also by selecting each category, But in my case I
just want to display the 4 documents result first and then category wise,
suppose 2 documents by the same Author etc

As per my requirement, I am doing DrillDown Search by asking the user to
provide such as title of the docment, author of the document, etc... as
advanced search option.

---
Jebarlin Robertson.R



On Mon, Feb 10, 2014 at 10:30 PM, Shai Erera  wrote:

> Ahh I see ... so given a single FacetResultNode, you would like to know
> which documents contributed to its weight (count in your case). This is not
> available immediately, that's why you need to do a drill-down query. So if
> you return the user a list of categories, when he clicks one of them, you
> perform a drill-down query on that category and retrieve all the associated
> documents.
>
> May I ask why do you need to know the list of documents given a
> FacetResultNode?
>
> Basically in the 3.6 API it's kind of not so simple to do what you want in
> one-pass, but in the 4.x API (especially the upcoming 4.7) it should be
> very easy -- when you traverse the list of matching documents, besides only
> reading the list of categories associated with it, you also store a map
> Category -> List. This isn't very cheap though ...
>
> So I guess it would be good if I understand why do you need to know which
> documents contributed to which category, before the results are returned to
> the user.
>
> Shai
>
>
> On Mon, Feb 10, 2014 at 3:16 PM, Jebarlin Robertson  >wrote:
>
> > Hi Shai,
> >
> > Thanks,
> >
> > I am using the same way of BooleanQuery only with list of PrefixQuery
> only.
> > I think I confused you sorry :) .
> >
> > I am using the same above code to get the result of documents. I am
> getting
> > the TopDocs and retrieving the Documents also, If I don't even try that
> for
> > the basic you will kill me :D.
> > But my question was different, from the List of FacetResult I am getting
> > only the counts or no of hits of Document in each category after
> iterating
> > the list.
> > I believe that the getLevel() of FacetNode returns the no of hits or no
> of
> > documents falls into the particular Category.
> > I need to know which are the documents are falling under the same
> category
> > from the FacetResult Object also.
> >
> > I hope you will understand my question :)
> >
> > Thank you :)
> >
> > --
> > Jebarlin
> >
> >
> >
> > On Mon, Feb 10, 2014 at 9:09 PM, Shai Erera  wrote:
> >
> > > Hi
> > >
> > > You will need to build a BooleanQuery which comprises a list of
> > > PrefixQuery. The relation between each PrefixQuery should be OR or AND,
> > as
> > > you see fit (I believe OR?).
> > >
> > > In order to get documents' attributes you should execute
> > searcher.search()
> > > w/ e.g. MultiCollector which wraps a FacetsCollector and
> > > TopScoreDocCollector. Then after .search() finished, you should pull
> the
> > > facet results from the FacetsCollector instance and the document
> results
> > > from the TopScoreDocCollector instance. Something like (I hope it
> > compiles
> > > in 3.6! :)):
> > >
> > > TopScoreDocCollector tsdc = TopScoreDocCollector.create(...);
> > > FacetsCollector fc = FacetsCollector.create(...);
> > > searcher.search(query, MultiCollector.wrap(tsdc, fc));
> > >
> > > List facetResults = fc.getFacetResults();
> > > TopDocs topDocs = tsdc.topDocs();
> > >
> > > Something like that..
> > >
> > > Shai
> > >
> > >
> > > On Mon, Feb 10, 2014 at 1:57 PM, Jebarlin Robertson <
> jebar...@gmail.com
> > > >wrote:
> > >
> > > > Dear Shai,
> > > >
> > > > Thank you for the quick response :)
> > > >
> > > > I have checked with PrefixQuery and term, it is working fine, But I
> > > think I
> > > > cannot pass multiple Category path in it. I am calling the
> > > > DrillDown.term() method 'N' number of times based on the number of
> > > Category
> > > > Path list.
> > > >
> > > > And I have one more question, When I get the FacetResult, I am
> getting
> > > only
> > > > the count of documents matched with the Category Path.
> > > > Is there anyway to get the Document 

Re: Regarding DrillDown search

2014-02-10 Thread Shai Erera
What you want sounds like grouping more like faceting?

So e.g. if you have an Author field with values A1, A2, A3, and the user
searches for 'love',
then if I understand correctly, you want to display something like:

Author/A1
  Doc1
  Doc2
Author/A2
  Doc3
  Doc4
Author/A3
  Doc5
  Doc6

Is that right?


Where's today your result page looks like this:

Facets   Results
--   ---
Author   Doc1_Title
  A1 (4) Doc1_Highlight
  A2 (3) 
  A3 (1) Doc2_Title
 Doc2_Highlight
 +++
 ...

(Forgive my lack of creativity :)).

If you're not interested in join, and just want to add to each document its
Author facet in the results pane, then I suggest you add another stored
field (only stored, not indexed) with the category value. And then you
could display:

Facets   Results
--   ---
Author   Doc1_Title
  A1 (4) Doc1_Highlight
  A2 (3) Author: A1
  A3 (1) 
 Doc2_Title
 Doc2_Highlight
 Author: A2
 +++
 ...

Did I understand properly?

Shai

On Mon, Feb 10, 2014 at 4:51 PM, Jebarlin Robertson wrote:

> Hi Shai,
>
> Thanks for the explanation :)
>
> For my requirement, I just want to display the list of resulted documents
> to the user.
> In Facet search case also, I already have the resulted documents list in
> TopDoc and the FacetResults have only the count of documents contributed to
> each Catagory,
>
> According to my understanding,
>
> Suppose I query for the word "Love", Now I do Facet Search and gets 4
> (Files) documents as matched results from TopScoreDocCollector as TopDocs
> and I will get the FacetResult from the FacetCollector.
> And the FacetResultsNode gives me only the values of the category and the
> count of how many documents falls under same category (May be by Author or
> other provided categories ) among the 4 resulted documents only.
>
> I feel, It will be good if I get the category association with the resulted
> documents, as I have the document list already from TopScoreDocCollector.
>
> I can do DrillDown Search also by selecting each category, But in my case I
> just want to display the 4 documents result first and then category wise,
> suppose 2 documents by the same Author etc
>
> As per my requirement, I am doing DrillDown Search by asking the user to
> provide such as title of the docment, author of the document, etc... as
> advanced search option.
>
> ---
> Jebarlin Robertson.R
>
>
>
> On Mon, Feb 10, 2014 at 10:30 PM, Shai Erera  wrote:
>
> > Ahh I see ... so given a single FacetResultNode, you would like to know
> > which documents contributed to its weight (count in your case). This is
> not
> > available immediately, that's why you need to do a drill-down query. So
> if
> > you return the user a list of categories, when he clicks one of them, you
> > perform a drill-down query on that category and retrieve all the
> associated
> > documents.
> >
> > May I ask why do you need to know the list of documents given a
> > FacetResultNode?
> >
> > Basically in the 3.6 API it's kind of not so simple to do what you want
> in
> > one-pass, but in the 4.x API (especially the upcoming 4.7) it should be
> > very easy -- when you traverse the list of matching documents, besides
> only
> > reading the list of categories associated with it, you also store a map
> > Category -> List. This isn't very cheap though ...
> >
> > So I guess it would be good if I understand why do you need to know which
> > documents contributed to which category, before the results are returned
> to
> > the user.
> >
> > Shai
> >
> >
> > On Mon, Feb 10, 2014 at 3:16 PM, Jebarlin Robertson  > >wrote:
> >
> > > Hi Shai,
> > >
> > > Thanks,
> > >
> > > I am using the same way of BooleanQuery only with list of PrefixQuery
> > only.
> > > I think I confused you sorry :) .
> > >
> > > I am using the same above code to get the result of documents. I am
> > getting
> > > the TopDocs and retrieving the Documents also, If I don't even try that
> > for
> > > the basic you will kill me :D.
> > > But my question was different, from the List of FacetResult I am
> getting
> > > only the counts or no of hits of Document in each category after
> > iterating
> > > the list.
> > > I believe that the getLevel() of FacetNode returns the no of hits or no
> > of
> > > documents falls into the particular Category.
> > > I need to know which are the documents are falling under the same
> > category
> > > from the FacetResult Object also.
> > >
> > > I hope you will understand my question :)
> > >
> > > Thank you :)
> > >
> > > --
> > > Jebarlin
> > >
> > >
> > >
> > > On Mon, Feb 10, 2014 at 9:09 PM, Shai Erera  wrote:
> > >
> > > > Hi
> > > >
> > > > You will need to build a BooleanQuery which comprises a list of
> > > > PrefixQuery. The relation between each PrefixQuery should be OR or
> AN

Re: Regarding DrillDown search

2014-02-10 Thread Jebarlin Robertson
Hi Shai,

Yeah exactly the same way I want to display.

Then I will do the same way of stored field.

It is not about lack of creativity, I might have not explained you in the
proper way :)

Thank you for all the support :)


On Tue, Feb 11, 2014 at 12:23 AM, Shai Erera  wrote:

> What you want sounds like grouping more like faceting?
>
> So e.g. if you have an Author field with values A1, A2, A3, and the user
> searches for 'love',
> then if I understand correctly, you want to display something like:
>
> Author/A1
>   Doc1
>   Doc2
> Author/A2
>   Doc3
>   Doc4
> Author/A3
>   Doc5
>   Doc6
>
> Is that right?
>
>
> Where's today your result page looks like this:
>
> Facets   Results
> --   ---
> Author   Doc1_Title
>   A1 (4) Doc1_Highlight
>   A2 (3) 
>   A3 (1) Doc2_Title
>  Doc2_Highlight
>  +++
>  ...
>
> (Forgive my lack of creativity :)).
>
> If you're not interested in join, and just want to add to each document its
> Author facet in the results pane, then I suggest you add another stored
> field (only stored, not indexed) with the category value. And then you
> could display:
>
> Facets   Results
> --   ---
> Author   Doc1_Title
>   A1 (4) Doc1_Highlight
>   A2 (3) Author: A1
>   A3 (1) 
>  Doc2_Title
>  Doc2_Highlight
>  Author: A2
>  +++
>  ...
>
> Did I understand properly?
>
> Shai
>
> On Mon, Feb 10, 2014 at 4:51 PM, Jebarlin Robertson  >wrote:
>
> > Hi Shai,
> >
> > Thanks for the explanation :)
> >
> > For my requirement, I just want to display the list of resulted documents
> > to the user.
> > In Facet search case also, I already have the resulted documents list in
> > TopDoc and the FacetResults have only the count of documents contributed
> to
> > each Catagory,
> >
> > According to my understanding,
> >
> > Suppose I query for the word "Love", Now I do Facet Search and gets 4
> > (Files) documents as matched results from TopScoreDocCollector as TopDocs
> > and I will get the FacetResult from the FacetCollector.
> > And the FacetResultsNode gives me only the values of the category and the
> > count of how many documents falls under same category (May be by Author
> or
> > other provided categories ) among the 4 resulted documents only.
> >
> > I feel, It will be good if I get the category association with the
> resulted
> > documents, as I have the document list already from TopScoreDocCollector.
> >
> > I can do DrillDown Search also by selecting each category, But in my
> case I
> > just want to display the 4 documents result first and then category wise,
> > suppose 2 documents by the same Author etc
> >
> > As per my requirement, I am doing DrillDown Search by asking the user to
> > provide such as title of the docment, author of the document, etc... as
> > advanced search option.
> >
> > ---
> > Jebarlin Robertson.R
> >
> >
> >
> > On Mon, Feb 10, 2014 at 10:30 PM, Shai Erera  wrote:
> >
> > > Ahh I see ... so given a single FacetResultNode, you would like to know
> > > which documents contributed to its weight (count in your case). This is
> > not
> > > available immediately, that's why you need to do a drill-down query. So
> > if
> > > you return the user a list of categories, when he clicks one of them,
> you
> > > perform a drill-down query on that category and retrieve all the
> > associated
> > > documents.
> > >
> > > May I ask why do you need to know the list of documents given a
> > > FacetResultNode?
> > >
> > > Basically in the 3.6 API it's kind of not so simple to do what you want
> > in
> > > one-pass, but in the 4.x API (especially the upcoming 4.7) it should be
> > > very easy -- when you traverse the list of matching documents, besides
> > only
> > > reading the list of categories associated with it, you also store a map
> > > Category -> List. This isn't very cheap though ...
> > >
> > > So I guess it would be good if I understand why do you need to know
> which
> > > documents contributed to which category, before the results are
> returned
> > to
> > > the user.
> > >
> > > Shai
> > >
> > >
> > > On Mon, Feb 10, 2014 at 3:16 PM, Jebarlin Robertson <
> jebar...@gmail.com
> > > >wrote:
> > >
> > > > Hi Shai,
> > > >
> > > > Thanks,
> > > >
> > > > I am using the same way of BooleanQuery only with list of PrefixQuery
> > > only.
> > > > I think I confused you sorry :) .
> > > >
> > > > I am using the same above code to get the result of documents. I am
> > > getting
> > > > the TopDocs and retrieving the Documents also, If I don't even try
> that
> > > for
> > > > the basic you will kill me :D.
> > > > But my question was different, from the List of FacetResult I am
> > getting
> > > > only the counts or no of hits of Document in each category after
> > > iterating
> > > > the list.
> > > > I believe that the g

Re: Regarding DrillDown search

2014-02-10 Thread Shai Erera
You're welcome. And I suggest that you upgrade to 4.7 as soon as it's out!
:)

Shai


On Mon, Feb 10, 2014 at 5:48 PM, Jebarlin Robertson wrote:

> Hi Shai,
>
> Yeah exactly the same way I want to display.
>
> Then I will do the same way of stored field.
>
> It is not about lack of creativity, I might have not explained you in the
> proper way :)
>
> Thank you for all the support :)
>
>
> On Tue, Feb 11, 2014 at 12:23 AM, Shai Erera  wrote:
>
> > What you want sounds like grouping more like faceting?
> >
> > So e.g. if you have an Author field with values A1, A2, A3, and the user
> > searches for 'love',
> > then if I understand correctly, you want to display something like:
> >
> > Author/A1
> >   Doc1
> >   Doc2
> > Author/A2
> >   Doc3
> >   Doc4
> > Author/A3
> >   Doc5
> >   Doc6
> >
> > Is that right?
> >
> >
> > Where's today your result page looks like this:
> >
> > Facets   Results
> > --   ---
> > Author   Doc1_Title
> >   A1 (4) Doc1_Highlight
> >   A2 (3) 
> >   A3 (1) Doc2_Title
> >  Doc2_Highlight
> >  +++
> >  ...
> >
> > (Forgive my lack of creativity :)).
> >
> > If you're not interested in join, and just want to add to each document
> its
> > Author facet in the results pane, then I suggest you add another stored
> > field (only stored, not indexed) with the category value. And then you
> > could display:
> >
> > Facets   Results
> > --   ---
> > Author   Doc1_Title
> >   A1 (4) Doc1_Highlight
> >   A2 (3) Author: A1
> >   A3 (1) 
> >  Doc2_Title
> >  Doc2_Highlight
> >  Author: A2
> >  +++
> >  ...
> >
> > Did I understand properly?
> >
> > Shai
> >
> > On Mon, Feb 10, 2014 at 4:51 PM, Jebarlin Robertson  > >wrote:
> >
> > > Hi Shai,
> > >
> > > Thanks for the explanation :)
> > >
> > > For my requirement, I just want to display the list of resulted
> documents
> > > to the user.
> > > In Facet search case also, I already have the resulted documents list
> in
> > > TopDoc and the FacetResults have only the count of documents
> contributed
> > to
> > > each Catagory,
> > >
> > > According to my understanding,
> > >
> > > Suppose I query for the word "Love", Now I do Facet Search and gets 4
> > > (Files) documents as matched results from TopScoreDocCollector as
> TopDocs
> > > and I will get the FacetResult from the FacetCollector.
> > > And the FacetResultsNode gives me only the values of the category and
> the
> > > count of how many documents falls under same category (May be by Author
> > or
> > > other provided categories ) among the 4 resulted documents only.
> > >
> > > I feel, It will be good if I get the category association with the
> > resulted
> > > documents, as I have the document list already from
> TopScoreDocCollector.
> > >
> > > I can do DrillDown Search also by selecting each category, But in my
> > case I
> > > just want to display the 4 documents result first and then category
> wise,
> > > suppose 2 documents by the same Author etc
> > >
> > > As per my requirement, I am doing DrillDown Search by asking the user
> to
> > > provide such as title of the docment, author of the document, etc... as
> > > advanced search option.
> > >
> > > ---
> > > Jebarlin Robertson.R
> > >
> > >
> > >
> > > On Mon, Feb 10, 2014 at 10:30 PM, Shai Erera  wrote:
> > >
> > > > Ahh I see ... so given a single FacetResultNode, you would like to
> know
> > > > which documents contributed to its weight (count in your case). This
> is
> > > not
> > > > available immediately, that's why you need to do a drill-down query.
> So
> > > if
> > > > you return the user a list of categories, when he clicks one of them,
> > you
> > > > perform a drill-down query on that category and retrieve all the
> > > associated
> > > > documents.
> > > >
> > > > May I ask why do you need to know the list of documents given a
> > > > FacetResultNode?
> > > >
> > > > Basically in the 3.6 API it's kind of not so simple to do what you
> want
> > > in
> > > > one-pass, but in the 4.x API (especially the upcoming 4.7) it should
> be
> > > > very easy -- when you traverse the list of matching documents,
> besides
> > > only
> > > > reading the list of categories associated with it, you also store a
> map
> > > > Category -> List. This isn't very cheap though ...
> > > >
> > > > So I guess it would be good if I understand why do you need to know
> > which
> > > > documents contributed to which category, before the results are
> > returned
> > > to
> > > > the user.
> > > >
> > > > Shai
> > > >
> > > >
> > > > On Mon, Feb 10, 2014 at 3:16 PM, Jebarlin Robertson <
> > jebar...@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi Shai,
> > > > >
> > > > > Thanks,
> > > > >
> > > > > I am using the same way of BooleanQuery only with list of
> PrefixQuery
> > > > only.
> > > > > 

Re: Faceting and Query-time Joins

2014-02-10 Thread Michael McCandless
Are you faceting on parent values or child values?

Parent values should be easy; child values is not.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Feb 10, 2014 at 9:05 AM, Jon Stewart
 wrote:
> Hello,
>
> tl;dr: I'd like to know how to do faceting over the result set of a
> query-time join (JoinUtils). If it's not currently supported by
> Lucene, I'd appreciate some pointers about what needs to be done.
>
> I'm working on a greenfields project with Lucene 4.6. The application
> treats its primary objects as a collection of child records. The child
> records are of different types and, unfortunately, are not available
> all at once (ruling out BlockJoinQuery). As the child records roll
> into the system for indexing, they're represented as Lucene Document
> objects that have the primary key of the parent object as a field. The
> child records themselves never change, so there's no need for
> re-indexing. I can use query-time joins on the parent ID field. So
> far, so good.
>
> The problem is that I also very much want to have faceting pertaining
> to the parent objects. Googling around the past couple days hasn't
> revealed much discussion of how to combine facets with query-time
> joins (except "nope": http://search-lucene.com/m/QTPadBcnv1). Is it
> possible to combine these two features with the above constraints? If
> so, how? If not in Lucene 4.6, is there related work in trunk? One
> thing I was thinking about last night is that it wouldn't seem to be
> too hard to do the faceting for this case by using update-able
> NumericDocValue on a dummy parent object, since that shouldn't require
> re-indexing.
>
> TIA,
>
> Jon
> --
> Jon Stewart, Principal
> (646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



New to Apache Lucene: Need help in querying data - text with wildCards

2014-02-10 Thread gudiseashok
I have an application which is a log-analyzer, and I am using Apache Lucene
to index my data, and I am storing only message in it (I am not storing all
other fields in my object), and I am not using any database so I am using
store for message though its huge) but I am taking care of deleting this
data weekly to start a fresh indexing.
I have created a domain object to ease my search with lucene in retrieving
and indexing  my data.
I have these kind of fields in my object, 
className (value is fully qualified class with package, example:
com.domain.infrastructure.MyClass), messageType (value example: xml, log
message, exception)
logLevel, timestamp (I am storing this as Long type)  
and logMessage (contains text and special characters like <,[,{.etc.)
Main purpose is to retrieve logMessage based on user request, few scenarios
below…

Case 1:  User can request a soap message (messageType:XML), at
particularTime (timestamp: longVariable), 
Case 2: User can request a particular message (messageType: logMessage), at
particular time (timestamp:longVariable), from particular className
(className:com.businessdomain.layer.MyClass)
Or Case 3: User can request a particular message(messageType: Exception), in
loglevel (logLevel: DEBUG) at particular time (timestamp:longVariable)
Currently I am Indexing data like this:

document.add(new StringField("className", logsVO.getClassName(),
Field.Store.NO));
document.add(new StringField("logLevel", logsVO.getLogLevel(),
Field.Store.NO));
document.add(new TextField("logMessage", logsVO.getLogMessage(),
Field.Store.YES));
document.add(new StringField("messageType",
logsVO.getMessageType().toString(), Field.Store.NO));
document.add(new NumericDocValuesField("path", logsVO.hashCode()));
document.add((new LongField("timeStamp", logsVO.getTimeStamp().getTime(),
Field.Store.NO)));

Actual Log Line is like this:
2013-12-19 15:53:42.379 [server.startup : 0]  DEBUG 
o.a.commons.digester3.Digester -
[ObjectCreateRule]{maplist/recvmap/recvfrag/recvfragoccurs/recvprop} Pop
'com.domain.ec.util.mapper.node.SomeClass'
So here 2013-12-19 15:53:42.379 is timestamp, 
[server.startup : 0] - I will ignore this part
DEBUG   is logLevel, 
‘o.a.commons.digester3.Digester’ is className 
[ObjectCreateRule]{maplist/recvmap/recvfrag/recvfragoccurs/recvprop} Pop
'com.domain.ec.util.mapper.node.SomeClass'  This is my logMessage

Now I am coming to my Problem: I have tried PhraseQuery,BooleanQuery and
WildcardQuery too, but only time I am getting results is when I mentioned a
small string like “pop” (in above logMessage), in all other cases which has
any special characters I am not getting the results. Can anyone suggest what
would be the pattern I have to use to satisfy above mentioned three cases
user request? 

I appreciate your help in this regard.










--
View this message in context: 
http://lucene.472066.n3.nabble.com/New-to-Apache-Lucene-Need-help-in-querying-data-text-with-wildCards-tp4116515.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: New to Apache Lucene: Need help in querying data - text with wildCards

2014-02-10 Thread Michael McCandless
Likely your analyzer (which one are you using?) is breaking up your
text into tokens you don't expect?

If you use QueryParser, passing the same analyzer, then it will also
tokenize your query into the same tokens, and you should get the
expected hits.

But you may need your own analyzer to "properly" (by your definition)
tokenize the log messages...

Mike McCandless

http://blog.mikemccandless.com


On Mon, Feb 10, 2014 at 12:06 PM, gudiseashok  wrote:
> I have an application which is a log-analyzer, and I am using Apache Lucene
> to index my data, and I am storing only message in it (I am not storing all
> other fields in my object), and I am not using any database so I am using
> store for message though its huge) but I am taking care of deleting this
> data weekly to start a fresh indexing.
> I have created a domain object to ease my search with lucene in retrieving
> and indexing  my data.
> I have these kind of fields in my object,
> className (value is fully qualified class with package, example:
> com.domain.infrastructure.MyClass), messageType (value example: xml, log
> message, exception)
> logLevel, timestamp (I am storing this as Long type)
> and logMessage (contains text and special characters like <,[,{.etc.)
> Main purpose is to retrieve logMessage based on user request, few scenarios
> below...
>
> Case 1:  User can request a soap message (messageType:XML), at
> particularTime (timestamp: longVariable),
> Case 2: User can request a particular message (messageType: logMessage), at
> particular time (timestamp:longVariable), from particular className
> (className:com.businessdomain.layer.MyClass)
> Or Case 3: User can request a particular message(messageType: Exception), in
> loglevel (logLevel: DEBUG) at particular time (timestamp:longVariable)
> Currently I am Indexing data like this:
> 
> document.add(new StringField("className", logsVO.getClassName(),
> Field.Store.NO));
> document.add(new StringField("logLevel", logsVO.getLogLevel(),
> Field.Store.NO));
> document.add(new TextField("logMessage", logsVO.getLogMessage(),
> Field.Store.YES));
> document.add(new StringField("messageType",
> logsVO.getMessageType().toString(), Field.Store.NO));
> document.add(new NumericDocValuesField("path", logsVO.hashCode()));
> document.add((new LongField("timeStamp", logsVO.getTimeStamp().getTime(),
> Field.Store.NO)));
> 
> Actual Log Line is like this:
> 2013-12-19 15:53:42.379 [server.startup : 0]  DEBUG
> o.a.commons.digester3.Digester -
> [ObjectCreateRule]{maplist/recvmap/recvfrag/recvfragoccurs/recvprop} Pop
> 'com.domain.ec.util.mapper.node.SomeClass'
> So here 2013-12-19 15:53:42.379 is timestamp,
> [server.startup : 0] - I will ignore this part
> DEBUG   is logLevel,
> 'o.a.commons.digester3.Digester' is className
> [ObjectCreateRule]{maplist/recvmap/recvfrag/recvfragoccurs/recvprop} Pop
> 'com.domain.ec.util.mapper.node.SomeClass'  This is my logMessage
>
> Now I am coming to my Problem: I have tried PhraseQuery,BooleanQuery and
> WildcardQuery too, but only time I am getting results is when I mentioned a
> small string like "pop" (in above logMessage), in all other cases which has
> any special characters I am not getting the results. Can anyone suggest what
> would be the pattern I have to use to satisfy above mentioned three cases
> user request?
>
> I appreciate your help in this regard.
>
>
>
>
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/New-to-Apache-Lucene-Need-help-in-querying-data-text-with-wildCards-tp4116515.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: New to Apache Lucene: Need help in querying data - text with wildCards

2014-02-10 Thread gudiseashok
Hi Michael

Thank you very much for your response, before I am trying your solution
(writing my own analyzer) I am just using Standard Analyzer which is
available in the example in lucene documentation, please see here my code
which is writing to lucene,

[code]
   File indexDirFile = new File(this.indexDir);

Directory dir = FSDirectory.open(indexDirFile);
/**
 * Use by certain classes to match version compatibility across
releases of Lucene.
 * WARNING: When changing the version parameter that you supply to
components in Lucene,
 * do not simply change the version at search-time, but instead also
adjust your
 * indexing code to match, and re-index.
 */

/**
 *
 */
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46);

/**
 * Holds all the configuration that is used to create an
IndexWriter.
 * Once IndexWriter has been created with this object, changes to
this
 * object will not affect the IndexWriter instance. For that, use
 * LiveIndexWriterConfig that is returned from
IndexWriter.getConfig().
 */
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_46,
analyzer);
   
if((overrideOpenModeFlag && openModeFlag)|| create){
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
}else{
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);   
}

   
this.writer = new IndexWriter(dir, iwc);
[code]


and my indexSearcher is ,

[code]

   IndexReader indexReader = null;
   IndexSearcher indexSearcher = null;
   try{
File indexDirFile = new File(this.indexDir);
Directory dir = FSDirectory.open(indexDirFile);
indexReader  = DirectoryReader.open(dir);
indexSearcher = new IndexSearcher(indexReader);
   }catch(IOException ioe){
   ioe.printStackTrace();
   }

   this.indexSearcher = indexSearcher;

[code]

please advice,






--
View this message in context: 
http://lucene.472066.n3.nabble.com/New-to-Apache-Lucene-Need-help-in-querying-data-text-with-wildCards-tp4116515p4116519.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Faceting and Query-time Joins

2014-02-10 Thread Jon Stewart
Child values. There really isn't a true parent, other than to refer to
the collection of children. However, the children are all of different
types and I'd expect that a given facet would only pertain to a given
child, i.e., you're not going to get a second child which involves the
same facet.

A bit unusual is that I care more about indexing performance and less
about query latency. I don't want queries that take too long, of
course, but a second or two or three is fine, and I don't expect much,
if any, concurrency (nor is adding RAM :-). So having to do a little
more work at query time isn't that big of a concern if I can avoid
re-indexing.


Jon


On Mon, Feb 10, 2014 at 12:04 PM, Michael McCandless
 wrote:
> Are you faceting on parent values or child values?
>
> Parent values should be easy; child values is not.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Feb 10, 2014 at 9:05 AM, Jon Stewart
>  wrote:
>> Hello,
>>
>> tl;dr: I'd like to know how to do faceting over the result set of a
>> query-time join (JoinUtils). If it's not currently supported by
>> Lucene, I'd appreciate some pointers about what needs to be done.
>>
>> I'm working on a greenfields project with Lucene 4.6. The application
>> treats its primary objects as a collection of child records. The child
>> records are of different types and, unfortunately, are not available
>> all at once (ruling out BlockJoinQuery). As the child records roll
>> into the system for indexing, they're represented as Lucene Document
>> objects that have the primary key of the parent object as a field. The
>> child records themselves never change, so there's no need for
>> re-indexing. I can use query-time joins on the parent ID field. So
>> far, so good.
>>
>> The problem is that I also very much want to have faceting pertaining
>> to the parent objects. Googling around the past couple days hasn't
>> revealed much discussion of how to combine facets with query-time
>> joins (except "nope": http://search-lucene.com/m/QTPadBcnv1). Is it
>> possible to combine these two features with the above constraints? If
>> so, how? If not in Lucene 4.6, is there related work in trunk? One
>> thing I was thinking about last night is that it wouldn't seem to be
>> too hard to do the faceting for this case by using update-able
>> NumericDocValue on a dummy parent object, since that shouldn't require
>> re-indexing.
>>
>> TIA,
>>
>> Jon
>> --
>> Jon Stewart, Principal
>> (646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>



-- 
Jon Stewart, Principal
(646) 719-0317 | j...@lightboxtechnologies.com | Arlington, VA

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



[Suggestions Required] 110 Concurrency users indexing on Lucene dont finish in 200 ms.

2014-02-10 Thread Umashanker, Srividhya
Group -



We have an Indexing and Searching Service (using Lucene 4.0) implemented over 
REST  as part of our framework, which all the related modules will use to 
publish data  that make it available for the UI.

Moreover, every rest call that our service receives has a proxy timeout limit 
of 200 ms.

Initially we started concurrency tests with Indexing.   Recently, we started 
doing some concurrency tests on our service to handle  min 110 users, who will 
be publishing json data ranging from 10, 50 100 documents by one user. The 
document is not huge like what u are using for your performance tests. Mostly 
we call updateDocument() as we do not know whether the document is already 
indexed or not



110 users/threads with a ramp up time of 0 seconds seems to fail randomly 
because some of the threads/REST call do not complete within 200 ms.



We tried the following.

1. maxBufferSize of 500 MB.

2. DWPT count to 10.

3. Tried updateDocument in separate thread, as said in one of your suggestions 
in the internet. Doesn't seem to help.

4. TiredMergePolicy has been set. Unfortunately this does not help as the 
maxBufferSize is not hit while we index, and merge does not happen yet.

5. We are yet to try with NMapDirectory.



Are there any other suggestions/experiences that you can share with us.  Are 
there any benchmark tests that you have done on concurrency?



Please help us.



-Vidhya