Re: How Can I modify the DocList and DocSet in solr

2014-06-10 Thread Vishnu Mishra
Thanks for the reply. I found one solution to modify DocList and DocSet after
searching.  Look At the following code snippet.

private void sortByRecordIDNew(SolrIndexSearcher.QueryResult result,
ResponseBuilder rb) throws IOException {

DocList docList = result.getDocListAndSet().docList;

SortedMap sortedMap = null;
if (projectSort == 0) { 
   sortedMap = new TreeMap(Collections.reverseOrder());
}else{
   sortedMap = new TreeMap();
}

Iterator iterator = docList.iterator();
while (iterator.hasNext()) {
int docId = (int) iterator.next();

Document d = rb.req.getSearcher().doc(docId);
Integer val = dbData.get(d.get("ID")); // dbData is a map contains
the recordId from the database  
// and the
Unique key in schema.xml 

sortedMap.put(val, docId);

}

float[] scores = new float[docList.size()];
int[] docs = new int[docList.size()];
int docCounter = 0;
int maxScore = 0;

Iterator it = sortedMap.keySet().iterator();
while (it.hasNext()) {
   int recordID = (int) it.next();
   int docId = sortedMap.get(recordID);
   scores[docCounter] = 1.0f;
   docs[docCounter] = docId;
   docCounter++;
 } 

docList = new DocSlice(0, docCounter, docs, scores, 0, maxScore);

result.setDocList(docList);

}


Call this method from QueryComponent's process method after the searching.
In the above code I have sorted the DocList in ascending or descending order
depends upon the user requirement. It works for me.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-Can-I-modify-the-DocList-and-DocSet-in-solr-tp4140754p4141132.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can we do conditional boosting using edismax ?

2014-06-10 Thread Shamik Bandopadhyay
Hi,

  I'm using edismax parser to perform a runtime boosting. Here's my sample
request handler entry.

text^2 title^3
Source:Blog^3 Source2:Videos^2
recip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0

As you can see, I'm adding weights to text and title, as well as, boosting
on source. What I'm trying to see is if there's a way to change the the
weights based on Source.E.g. for source "Blog", I would like to have the
following boost "text^3 title^2" while for source "Videos" , I prefer
"text^2 title^3".

Any pointers will be appreciated.

Thanks,
Shamik


Aw: Fw: highlighting on hl.alternateField (copyField target) doesnt highlight

2014-06-10 Thread jay list
Answer to myself:
using the solr.KeywordTokenizerFactory and solr.WordDelimiterFilterFactory can 
preserve the original phone number and can add a token without containing 
spaces. 

input:  "12345 67890"
tokens: "12345 67890", "12345", "67890", "1234567890"

Two advantages: I don't need another field and the highlighter works as 
aspected.
Best Regards.

> Gesendet: Donnerstag, 05. Juni 2014 um 09:14 Uhr
> Von: "jay list" 
> An: solr-user@lucene.apache.org
> Betreff: Fw: highlighting on hl.alternateField (copyField target) doesnt 
> highlight
>
> Anybody knowing this issue?
> 
> > Gesendet: Dienstag, 03. Juni 2014 um 09:11 Uhr
> > Von: "jay list" 
> > An: solr-user@lucene.apache.org
> > Betreff: highlighting on hl.alternateField (copyField target) doesnt 
> > highlight
> >
> > 
> > Hello,
> >  
> > im trying to implement a user friendly search for phone numbers. These 
> > numbers consist out of two digit-tokens like "12345 67890".
> >  
> > Finally I want the highlighting for the phone number in the search result, 
> > without any concerns about was this search result hit by field  tel  or 
> > copyField  tel2.
> >  
> > The field tel is splitted by a StandardTokenizer in two tokens "12345" AND 
> > "67890".
> > And I want to catch up those people, who enter "1234567890" without any 
> > space.
> > I use copyField  tel2  to a solr.PatternReplaceCharFilterFactory to 
> > eliminate non digits followed by a solr.KeywordTokenizerFactory.
> >  
> > In both cases the search hits as expected.
> >  
> > The highlighter works well for  tel  or  tel2,  but I want the highlight 
> > always on field  tel!
> > Using  f.tel.hl.alternateField=tel2  is returning the field value wihtout 
> > any highlighting.
> >  
> > 
> >  tel2:1234567890
> >  tel2
> >  true
> >  true
> >  
> >  
> >  tel,tel2
> >  tel,tel2
> >  xml
> >  typ:person
> > 
> > 
> > ...
> > 
> > 
> >  
> >   user1
> >   12345 67890
> >   12345 67890
> > 
> > 
> > ...
> > 
> > 
> >  
> >   
> >123456 67890 
> >   
> >   
> >123456 67890
> >   
> >  
> > 
> > 
> > Any idea? Or do I have to change my velocity macros, always looking for a 
> > different highlighted field?
> > Best Regards
>


Re: Recommended ZooKeeper topology in Production

2014-06-10 Thread Steve McKay
Dedicated machines are a good idea. The main thing is to make sure that ZK 
always has IOPS available for transaction log writes. That's easy to ensure 
when each ZK instance has its own hardware. The standard practice, as far as I 
know, is to have 3 physical boxes spread among racks/datacenters/continents as 
HA needs dictate.

Sharing a machine between Solr and ZK is definitely not ideal. Instead of Solr 
machines and ZK machines, now you have Solr machines and Solr+ZK machines. It 
adds management overhead because now you have to take ZK into account while 
administering your Solr cluster, and unless you give ZK its own disk it will 
have to compete with Solr for I/O.

On Jun 10, 2014, at 2:58 AM, Gili Nachum  wrote:

> Is there a recommended ZooKeeper topology for production Solr environments?
> 
> I was planning: 3 ZK nodes, each on its own dedicated machine.
> 
> Thinking that dedicated machines, separate from Solr servers, would keep ZK
> isolated from resource contention spikes that may occur on Solr. Also, if a
> Solr machine goes down, there would still be 3 ZK nodes to handle the event
> properly.
> 
> If I want to save on resources, placing each ZK instance on the same box as
> Solr instance in considered common practice in production environments?
> 
> Thanks!



RE: Format version is not supported error

2014-06-10 Thread Joshi, Shital
Yes that was the problem. Switching back works now. Thanks!

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Tuesday, June 10, 2014 4:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Format version is not supported error

On 6/10/2014 1:17 PM, Joshi, Shital wrote:
> We upgraded from Solr version 4.4 to 4.8. In doing so we also upgraded from 
> JDK 1.6 to 1.7. After few days of testing, we decided to move back to 4.4. We 
> get following error in all nodes and our cloud is not usable. How do we fix 
> it?
>
> Format version is not supported (resource: 
> MMapIndexInput(path="/local/data/solr1/index.20140324041707963/segments.gen")):
>  -3 (needs to be between -2 and -2)
>
> We tried to switch back to 4.8 but get same error. 

This would mean that your index is at least partially built by the 4.8
version.  I believe the default index format changed in version 4.5, so
any index created or modified by 4.8 will not work in 4.4.

It should have worked once you upgraded back to 4.8, so what I think
might be happening is that you still have leftovers from the 4.4 .war
extraction, or possibly some older Lucene jars on your classpath that
are getting loaded instead of the 4.8 jars.  You'll want to remove any
extra lucene or solr jars that are hanging around, as well as any
extracted .war file contents in your servlet container.  For the example
container, you can find these extracted files in the "solr-webapp"
directory.

Thanks,
Shawn



Re: Format version is not supported error

2014-06-10 Thread Shawn Heisey
On 6/10/2014 1:17 PM, Joshi, Shital wrote:
> We upgraded from Solr version 4.4 to 4.8. In doing so we also upgraded from 
> JDK 1.6 to 1.7. After few days of testing, we decided to move back to 4.4. We 
> get following error in all nodes and our cloud is not usable. How do we fix 
> it?
>
> Format version is not supported (resource: 
> MMapIndexInput(path="/local/data/solr1/index.20140324041707963/segments.gen")):
>  -3 (needs to be between -2 and -2)
>
> We tried to switch back to 4.8 but get same error. 

This would mean that your index is at least partially built by the 4.8
version.  I believe the default index format changed in version 4.5, so
any index created or modified by 4.8 will not work in 4.4.

It should have worked once you upgraded back to 4.8, so what I think
might be happening is that you still have leftovers from the 4.4 .war
extraction, or possibly some older Lucene jars on your classpath that
are getting loaded instead of the 4.8 jars.  You'll want to remove any
extra lucene or solr jars that are hanging around, as well as any
extracted .war file contents in your servlet container.  For the example
container, you can find these extracted files in the "solr-webapp"
directory.

Thanks,
Shawn



Re: Large disjunction query practices

2014-06-10 Thread Joe Gresock
They're really just seeking a bulk read of the data.  The user has 300K+
terms they want to search in the textual data.  It seems to be a pretty
rare case.  I think your answers have given me some good info.


On Mon, Jun 9, 2014 at 9:07 AM, Jack Krupansky 
wrote:

> Are they expecting relevancy ranking or merely seeking to a bulk read of
> those documents? Please detail what the user is trying to accomplish with
> such a monster list of IDs.
>
> Generally, queries of more than a few dozen terms are a bad idea. If for
> no other reason than that if you need to debug them or examine the results
> by hand, it will be a nightmare. OTOH, some people really love drama and
> just can't get enough of it.
>
> The general guidance is to keep requests and responses relatively small.
> Keep network traffic down. Keep compute intensity down. Keep memory
> requirements down.
>
> Small is better.
>
> -- Jack Krupansky
>
> -Original Message- From: Joe Gresock
> Sent: Monday, June 9, 2014 8:50 AM
> To: solr-user@lucene.apache.org
> Subject: Large disjunction query practices
>
>
> I'm wondering what the best practice for large disjunct queries in Solr is.
> A user wants to submit a query for several hundred thousand terms, like:
> (term1 OR term2 OR ... term500,000)
>
> I know it might be better to break this up into multiple queries that can
> be merged on the user's end, but I'm wondering if there's guidance for a
> good limit of OR'ed terms per query.  100 terms?  200? 500?  Any idea what
> kinds of data set or memory limitations might govern this threshold?
>
> Thanks,
> Joe
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.*-Philippians 4:12-13*
>



-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*


Format version is not supported error

2014-06-10 Thread Joshi, Shital
Hi,

We upgraded from Solr version 4.4 to 4.8. In doing so we also upgraded from JDK 
1.6 to 1.7. After few days of testing, we decided to move back to 4.4. We get 
following error in all nodes and our cloud is not usable. How do we fix it?

Format version is not supported (resource: 
MMapIndexInput(path="/local/data/solr1/index.20140324041707963/segments.gen")): 
-3 (needs to be between -2 and -2)

We tried to switch back to 4.8 but get same error. 

Any help is highly appreciated.

Thanks!




Custom QueryComponent to rewrite dismax query

2014-06-10 Thread Peter Keegan
We are using the 'edismax' query parser for its many benefits over the
standard Lucene parser. For queries with more than 5 or 6 keywords (which
is a lot for our typical user), the recall can be very high (sometimes
matching 75% or more of the documents). This high recall, when coupled with
some custom PostFilter scoring, is hurting the query performance.  I tried
varying the 'mm' (minimum match) parameter, but at values less than 100%,
the response time didn't improve much, and at 100%, there were often no
results, which is unacceptable.

So, I wrote a custom QueryComponent which rewrites the DisMax query.
Initially, the MinShouldMatch value is set to 100%. If the search returns 0
results, MinShouldMatch is set to 1 and the search is retried. This
improved the QPS throughput by about 2.5X. However, this only worked with
an unsharded index. With a sharded index, each shard returned only the
results from the first search (mm=100%). In the debugger, I could see 2
'response/ResultContext' NV-Pairs in the SolrQueryResponse object, so I
added code to remove the first pair if there were 2 pair present, which
fixed this problem. My question: is removing the extra ResultContext a
reasonable solution to this problem? It just seems a little brittle to me.

Thanks,
Peter


Re: How Can I modify the DocList and DocSet in solr

2014-06-10 Thread Joel Bernstein
Not sure if this helps but it Solr 4.9, there is a new feature called
RankQueries. You can read about it here:
http://heliosearch.org/solrs-new-rankquery-feature/. Solr's new
ReRankingQParserPlugin is built off of RankQueries.



Joel Bernstein
Search Engineer at Heliosearch


On Tue, Jun 10, 2014 at 1:38 PM, eirslett  wrote:

> I have the same issue - I want to change the ordering of the DocList in a
> custom SearchComponent that is executed after QueryComponent.
> (
> http://stackoverflow.com/questions/24147213/custom-solr-sorting-that-is-aware-of-its-neighbours
> )
> However, it seems like you can't do that, because of the cache?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-Can-I-modify-the-DocList-and-DocSet-in-solr-tp4140754p4141033.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How Can I modify the DocList and DocSet in solr

2014-06-10 Thread eirslett
I have the same issue - I want to change the ordering of the DocList in a
custom SearchComponent that is executed after QueryComponent.
(http://stackoverflow.com/questions/24147213/custom-solr-sorting-that-is-aware-of-its-neighbours)
However, it seems like you can't do that, because of the cache?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-Can-I-modify-the-DocList-and-DocSet-in-solr-tp4140754p4141033.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to store ACL in solr?

2014-06-10 Thread Philip Durbin
You can see an example of ACLs stored under "perms_ss" here:
http://lucene.472066.n3.nabble.com/document-level-security-filter-solution-for-Solr-tp4126992p4127576.html


On Tue, Jun 10, 2014 at 12:15 PM, lalitjangra 
wrote:

> Hi,I am indexing some content from a couple of content repositories into
> solr
> and it works fine as metadata attributes of content items are indexed into
> solr.Now i want to store ACL of all content  items into solr and return ACL
> back in search results.How can i achieve it? Is there any solr
> plugin/connector available for same or Do i need to write custom Java code
> for this? If so , from where to start the Java code i.e. which all
> class/interfaces to inherit etc.Thanks a lot for help.Regards.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-store-ACL-in-solr-tp4141006.html
> Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Philip Durbin
Software Developer for http://thedata.org
http://www.iq.harvard.edu/people/philip-durbin


Re: Large disjunction query practices

2014-06-10 Thread Gili Nachum
Yes, most cases there would be some other, better, way to accomplish what
you're after, share your high level goal.
By default, Lucene, and Solr, limit the max number of clauses to 1024, even
before that your performance would go down the drain.

1024




On Tue, Jun 10, 2014 at 10:21 AM, Ahmet Arslan 
wrote:

> Hi,
>
> Where are these ORed terms coming from? A user cannot enter this much term.
> THere are other solutions, joins, post filters etc. You need to tell us
> your high level goal.
>
>
>
> On Monday, June 9, 2014 3:51 PM, Joe Gresock  wrote:
> I'm wondering what the best practice for large disjunct queries in Solr is.
> A user wants to submit a query for several hundred thousand terms, like:
> (term1 OR term2 OR ... term500,000)
>
> I know it might be better to break this up into multiple queries that can
> be merged on the user's end, but I'm wondering if there's guidance for a
> good limit of OR'ed terms per query.  100 terms?  200? 500?  Any idea what
> kinds of data set or memory limitations might govern this threshold?
>
> Thanks,
> Joe
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.*-Philippians 4:12-13*
>
>


Re: How to store ACL in solr?

2014-06-10 Thread Alexandre Rafalovitch
Did you look at ManifoldCF?
On 10/06/2014 11:16 pm, "lalitjangra"  wrote:

> Hi,I am indexing some content from a couple of content repositories into
> solr
> and it works fine as metadata attributes of content items are indexed into
> solr.Now i want to store ACL of all content  items into solr and return ACL
> back in search results.How can i achieve it? Is there any solr
> plugin/connector available for same or Do i need to write custom Java code
> for this? If so , from where to start the Java code i.e. which all
> class/interfaces to inherit etc.Thanks a lot for help.Regards.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-store-ACL-in-solr-tp4141006.html
> Sent from the Solr - User mailing list archive at Nabble.com.


How to store ACL in solr?

2014-06-10 Thread lalitjangra
Hi,I am indexing some content from a couple of content repositories into solr
and it works fine as metadata attributes of content items are indexed into
solr.Now i want to store ACL of all content  items into solr and return ACL
back in search results.How can i achieve it? Is there any solr
plugin/connector available for same or Do i need to write custom Java code
for this? If so , from where to start the Java code i.e. which all
class/interfaces to inherit etc.Thanks a lot for help.Regards.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-store-ACL-in-solr-tp4141006.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: ANN: Solr Next

2014-06-10 Thread Jean-Sebastien Vachon
Hi Yonik,

Very impressive results. Looking forward to use this on our systems. Any idea 
what`s the plan for this feature? Will it make its way into Solr 4.9? or do we 
have to switch to HeliosSearch to be able to use it?

Thanks

> -Original Message-
> From: Yonik Seeley [mailto:ysee...@gmail.com]
> Sent: June-09-14 10:50 AM
> To: solr-user@lucene.apache.org
> Subject: Re: ANN: Solr Next
> 
> On Tue, Jan 7, 2014 at 1:53 PM, Yonik Seeley  wrote:
> [...]
> > Next major feature: Native Code Optimizations.
> > In addition to moving more large data structures off-heap(like
> > UnInvertedField?), I am planning to implement native code
> > optimizations for certain hotspots.  Native code faceting would be an
> > obvious first choice since it can often be a CPU bottleneck.
> 
> It's in!  Abbreviated report: 2x performance increase over stock solr faceting
> (which is already fast!) http://heliosearch.org/native-code-faceting/
> 
> -Yonik
> http://heliosearch.org -- making solr shine
> 
> > Project resources:
> >
> > https://github.com/Heliosearch/heliosearch
> >
> > https://groups.google.com/forum/#!forum/heliosearch
> > https://groups.google.com/forum/#!forum/heliosearch-dev
> >
> > Freenode IRC: #heliosearch #heliosearch-dev
> >
> > -Yonik
> 
> -
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
> 27/05/2014 La Base de données des virus a expiré.


Re: can Solr keep its last index position?

2014-06-10 Thread Shawn Heisey
On 6/10/2014 2:11 AM, usmanZahid wrote:
> I am working with Solr search engine and on a stage where i have to take
> some implementation decision.
>
> I have a large file directory (1TB) and while indexing for the first time we
> need to maintain history of our indexing position so that we can run the
> indexer for 10 hours every day untill the complete documents are indexed.
>
> How can we keep track of where Solr left indexing last time for an accurate
> indexing start next time?
>
> How to track changes in the documents which are already indexed or if there
> are new files? (do i have start the index again from scratch in this case?)
>
> how can i increase performance while indexing?

All this depends on how you are indexing and what information needs to
be tracked.  If you are using the dataimport handler and the information
you need to track is the timestamp when the last import started, then
the dataimport handler automatically tracks this information and it will
be available for a delta-import.

If the information that needs to be tracked is different (an identifier,
a filename, autoincrement value, etc) or you are not using the
dataimport handler, then you must keep track of the information
yourself, in any way that makes sense for your program.

To increase indexing speed, use multiple threads in your indexing
program that are indexing documents simultaneously.  Solr can handle
many threads all receiving update requests at the same time.

Thanks,
Shawn



RE: Recommended ZooKeeper topology in Production

2014-06-10 Thread Markus Jelsma
Yes, always use three or a higher odd number of machines. It is best to have 
them on dedicated machines and unless the cluster is very large three small VPS 
machines with 512 MB RAM suffice.
 
-Original message-
From:Gili Nachum 
Sent:Tue 10-06-2014 08:58
Subject:Recommended ZooKeeper topology in Production
To:solr-user@lucene.apache.org; 
Is there a recommended ZooKeeper topology for production Solr environments?

I was planning: 3 ZK nodes, each on its own dedicated machine.

Thinking that dedicated machines, separate from Solr servers, would keep ZK
isolated from resource contention spikes that may occur on Solr. Also, if a
Solr machine goes down, there would still be 3 ZK nodes to handle the event
properly.

If I want to save on resources, placing each ZK instance on the same box as
Solr instance in considered common practice in production environments?

Thanks!


RE: Edismax should, should not, exact match operators

2014-06-10 Thread Markus Jelsma
http://wiki.apache.org/solr/ExtendedDisMax#Query_Syntax
 
-Original message-
From:michael.boom 
Sent:Tue 10-06-2014 13:15
Subject:Edismax should, should not, exact match operators
To:solr-user@lucene.apache.org; 
On google a user can query using operators like "+" or "-" and quote the
desired term in order to get the desired match.
Does something like this come by default with edismax parser ?



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Edismax-should-should-not-exact-match-operators-tp4140967.html
Sent from the Solr - User mailing list archive at Nabble.com.


Edismax should, should not, exact match operators

2014-06-10 Thread michael.boom
On google a user can query using operators like "+" or "-" and quote the
desired term in order to get the desired match.
Does something like this come by default with edismax parser ?



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Edismax-should-should-not-exact-match-operators-tp4140967.html
Sent from the Solr - User mailing list archive at Nabble.com.


Replicate a standalone collection

2014-06-10 Thread Alexandre Rafalovitch
Hello,

Is there a way to clone/replicate a standalone collection. I know the
Cloud API does it but I want the replica to not be
synchronized/updated after the copy is done. And I don't know Cloud
API code enough to know what the consequences are.

The specific scenario is for creating tutorial collections. I want to
run a sequence of commands against the repository and - at specific
points - clone off the progress-to-date to a new collection. Then,
individual check-points exist in parallel with separate names.
Alternatively, I can start with X number of collection replicas and
drop them off one by one. But then, how do I query them separately?

This creation process will be happening offline and single-threaded,
so any reasonable approach would accepted. I can even run in cloud
mode, as long as the "separate after copy" requirement is maintained.

My worst case scenarios are probably just re-rolling the sequence from
the beginning against each new collection with a bit more script each
time. Or, closing a core, doing physical copy and reopening. So, I am
hoping for something better.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


Re: Solr Certificat

2014-06-10 Thread ienjreny
I am looking for the part "I can prove my knowledge".

Thanks.


On Tue, Jun 10, 2014 at 11:30 AM, Alexandre Rafalovitch [via Lucene] <
ml-node+s472066n4140947...@n3.nabble.com> wrote:

> You may need to be more precise on the meaning of the word here.
> Certificate as in SSL? Or as in "I can prove my knowledge"? Or,
> something different again? The answer, BTW, is NO for both of my
> suggestions, but there might be additional information.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Tue, Jun 10, 2014 at 3:15 PM, ienjreny <[hidden email]
> > wrote:
> > Hello dears,
> >
> > Is there any certificate for Solr?
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Solr-Certificat-tp4140940p4140947.html
>  To unsubscribe from Solr Certificat, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Certificat-tp4140940p4140948.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Certificat

2014-06-10 Thread Alexandre Rafalovitch
You may need to be more precise on the meaning of the word here.
Certificate as in SSL? Or as in "I can prove my knowledge"? Or,
something different again? The answer, BTW, is NO for both of my
suggestions, but there might be additional information.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Jun 10, 2014 at 3:15 PM, ienjreny  wrote:
> Hello dears,
>
> Is there any certificate for Solr?


can Solr keep its last index position?

2014-06-10 Thread usmanZahid
HI

I am working with Solr search engine and on a stage where i have to take
some implementation decision.

I have a large file directory (1TB) and while indexing for the first time we
need to maintain history of our indexing position so that we can run the
indexer for 10 hours every day untill the complete documents are indexed.

How can we keep track of where Solr left indexing last time for an accurate
indexing start next time?

How to track changes in the documents which are already indexed or if there
are new files? (do i have start the index again from scratch in this case?)

how can i increase performance while indexing?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-Solr-keep-its-last-index-position-tp4140938.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance/scaling with custom function queries

2014-06-10 Thread Robert Krüger
Great, I was hoping for that. In my case I will have to deal with the
worst case scenario, i.e. all documents matching the query, because
the only criterion is the fingerprint and the result of the
distance/similarity function which will have to be executed for every
document. However, I am dealing with a scenario where there will not
be many concurrent users.

Thank you.

On Mon, Jun 9, 2014 at 1:57 AM, Joel Bernstein  wrote:
> You only need to have fast access to the fingerprint field so only that
> field needs to be in memory. You'll want to review how Lucene DocValues and
> FieldCache work. Sorting is done with a PriorityQueue so only the top N
> docs are kept in memory.
>
> You'll only need to access the fingerprint field values for documents that
> match the query, so it won't be a full table scan unless all the docs match
> the query.
>
> Sounds like an interesting project. Please keep us posted.
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Sun, Jun 8, 2014 at 6:17 AM, Robert Krüger  wrote:
>
>> Hi,
>>
>> let's say I have an index that contains a field of type BinaryField
>> called "fingerprint" that stores a few (let's say 100) bytes that are
>> some kind of digital fingerprint-like thing.
>>
>> Let's say I want to perform queries on that field to achieve sorting
>> or filtering based on a kind of custom distance function
>> "customDistance", i.e. I input a reference "fingerprint" and Solr
>> returns either all documents sorted by
>> customDistance(,) or use
>> that in an frange expression for filtering.
>>
>> I have read http://wiki.apache.org/solr/SolrPerformanceFactors and I
>> do understand that using function queries with a custom function is
>> definitely an expensive thing as it will result in what is called a
>> "full table scan" in the sql world, i.e. data from all documents needs
>> to be touched to select the correct documents or sort by a function's
>> result.
>>
>> Given all that and provided, I have to use a custom function for my
>> needs, I would like to know a few more details about solr architecture
>> to understand what I have to look out for.
>>
>> I will have potentially millions of records. Does the data contained
>> in other index fields play a role when I only use the "fingerprint"
>> field for sorting and searching when it comes to RAM usage? I am
>> hoping to calculate that my RAM should be able to accommodate the
>> fingerprint data of all available documents for the queries to be fast
>> but not fingerprint data and all other indexed or stored data.
>>
>> Example: My fingerprint data needs 100bytes per document, my other
>> indexed field data needs 900 bytes per document. Will I need 100MB or
>> 1GB to fit all data that is needed to process one query in memory?
>>
>> Are there other things to be aware of?
>>
>> Thanks,
>>
>> Robert
>>



-- 
Robert Krüger
Managing Partner
Lesspain GmbH & Co. KG

www.lesspain-software.com


Solr Certificat

2014-06-10 Thread ienjreny
Hello dears,

Is there any certificate for Solr?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Certificat-tp4140940.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Large disjunction query practices

2014-06-10 Thread Ahmet Arslan
Hi,

Where are these ORed terms coming from? A user cannot enter this much term.
THere are other solutions, joins, post filters etc. You need to tell us your 
high level goal.



On Monday, June 9, 2014 3:51 PM, Joe Gresock  wrote:
I'm wondering what the best practice for large disjunct queries in Solr is.
A user wants to submit a query for several hundred thousand terms, like:
(term1 OR term2 OR ... term500,000)

I know it might be better to break this up into multiple queries that can
be merged on the user's end, but I'm wondering if there's guidance for a
good limit of OR'ed terms per query.  100 terms?  200? 500?  Any idea what
kinds of data set or memory limitations might govern this threshold?

Thanks,
Joe

-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*



Re: accessing individual elements of a multivalued field

2014-06-10 Thread Ahmet Arslan
Hi,

There is a workaround with dynamic fields. Trick is you embed data/into into a 
field name. Normally data is assigned to field value. In this approach field 
name contains data too.  

Instead of multivalued cat field, define cat_* dynamic field. 
https://cwiki.apache.org/confluence/display/solr/Dynamic+Fields

prod: p
cat : catA,catB,catC

becomes

prod: p
cat_1 : catA
cat_2 : catB
cat_3 : catC

return prod where  (cat_1 == catA) or (cat_2==catB).

becomes q=cat_1:catA OR cat_2:catB


There is another solution (FieldMaskingSpanQuery) but it requires writing 
custom java code (solr plugin)
http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html




On Monday, June 9, 2014 9:54 PM, kritarth.anand  
wrote:
hi,

prod: p
cat : catA,catB,catC

prod :q 
cat : catB, catC,catD

My schema consists of documents with uid : 'prod's and then they belong can
to multiple categories called 'cat' and which are represented as a
multivalued field. For a particular kind of query I need to access
individual elements separately as in 

return prod where  (cat_1 == catA) or (cat_2==catB). is there a way by which
i can do that?

thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/accessing-individual-elements-of-a-multivalued-field-tp4140862.html
Sent from the Solr - User mailing list archive at Nabble.com.