Re: Mongo DB Users

2014-09-16 Thread Joan
Remove please

2014-09-16 6:59 GMT+02:00 Patti Kelroe-Cooke patt...@gmail.com:

 Remove

 Kind regards
 Patti

 On Mon, Sep 15, 2014 at 5:35 PM, Aaron Susan aaronsus...@gmail.com
 wrote:

  Hi,
 
  I am here to inform you that we are having a contact list of *Mongo DB
  Users *would you be interested in it?
 
  Data Field’s Consist Of: Name, Job Title, Verified Phone Number, Verified
  Email Address, Company Name  Address Employee Size, Revenue size, SIC
  Code, Industry Type etc.,
 
  We also provide other technology users as well depends on your
 requirement.
 
  For Example:
 
 
  *Red Hat *
 
  *Terra data *
 
  *Net-app *
 
  *NuoDB*
 
  *MongoHQ ** and many more*
 
 
  We also provide IT Decision Makers, Sales and Marketing Decision Makers,
  C-level Titles and other titles as per your requirement.
 
  Please review and let me know your interest if you are looking for above
  mentioned users list or other contacts list for your campaigns.
 
  Waiting for a positive response!
 
  Thanks
 
  *Aaron Susan*
  Data Specialist
 
  If you are not the right person, feel free to forward this email to the
  right person in your organization. To opt out response Remove
 



Re: Mongo DB Users

2014-09-16 Thread Amey Patil
Remove.

On Tue, Sep 16, 2014 at 12:58 PM, Joan joan.monp...@gmail.com wrote:

 Remove please

 2014-09-16 6:59 GMT+02:00 Patti Kelroe-Cooke patt...@gmail.com:

  Remove
 
  Kind regards
  Patti
 
  On Mon, Sep 15, 2014 at 5:35 PM, Aaron Susan aaronsus...@gmail.com
  wrote:
 
   Hi,
  
   I am here to inform you that we are having a contact list of *Mongo DB
   Users *would you be interested in it?
  
   Data Field’s Consist Of: Name, Job Title, Verified Phone Number,
 Verified
   Email Address, Company Name  Address Employee Size, Revenue size, SIC
   Code, Industry Type etc.,
  
   We also provide other technology users as well depends on your
  requirement.
  
   For Example:
  
  
   *Red Hat *
  
   *Terra data *
  
   *Net-app *
  
   *NuoDB*
  
   *MongoHQ ** and many more*
  
  
   We also provide IT Decision Makers, Sales and Marketing Decision
 Makers,
   C-level Titles and other titles as per your requirement.
  
   Please review and let me know your interest if you are looking for
 above
   mentioned users list or other contacts list for your campaigns.
  
   Waiting for a positive response!
  
   Thanks
  
   *Aaron Susan*
   Data Specialist
  
   If you are not the right person, feel free to forward this email to the
   right person in your organization. To opt out response Remove
  
 



Re: Mongo DB Users

2014-09-16 Thread Karolina Dobromiła Jeleń
remove please

On Tue, Sep 16, 2014 at 9:35 AM, Amey Patil amey.pa...@germin8.com wrote:

 Remove.

 On Tue, Sep 16, 2014 at 12:58 PM, Joan joan.monp...@gmail.com wrote:

  Remove please
 
  2014-09-16 6:59 GMT+02:00 Patti Kelroe-Cooke patt...@gmail.com:
 
   Remove
  
   Kind regards
   Patti
  
   On Mon, Sep 15, 2014 at 5:35 PM, Aaron Susan aaronsus...@gmail.com
   wrote:
  
Hi,
   
I am here to inform you that we are having a contact list of *Mongo
 DB
Users *would you be interested in it?
   
Data Field’s Consist Of: Name, Job Title, Verified Phone Number,
  Verified
Email Address, Company Name  Address Employee Size, Revenue size,
 SIC
Code, Industry Type etc.,
   
We also provide other technology users as well depends on your
   requirement.
   
For Example:
   
   
*Red Hat *
   
*Terra data *
   
*Net-app *
   
*NuoDB*
   
*MongoHQ ** and many more*
   
   
We also provide IT Decision Makers, Sales and Marketing Decision
  Makers,
C-level Titles and other titles as per your requirement.
   
Please review and let me know your interest if you are looking for
  above
mentioned users list or other contacts list for your campaigns.
   
Waiting for a positive response!
   
Thanks
   
*Aaron Susan*
Data Specialist
   
If you are not the right person, feel free to forward this email to
 the
right person in your organization. To opt out response Remove
   
  
 




-- 
Shortest distance between two points is science. - Warrick Brown

μ!  μ!  μ!  8D~


Re: solr/lucene 4.10 out of memory issues

2014-09-16 Thread Luis Carlos Guerrero
Thanks for the response, I've been working on solving some of the most
evident issues and I also added your garbage collector parameters. First of
all the Lucene field cache is being filled with some entries which are
marked as 'insanity'. Some of these were related to a custom field that we
use for our ranking. We fixed our custom plugin classes so that we wouldn't
see any entries related to those fields there, but it seems there are other
related problems with the field cache. Mainly the cache is being filled
with these types of insanity entries:

'SUBREADER: Found caches for descendants of StandardDirectoryReader'

They are all related to standard solr fields. Could it be that our current
schemas and configs have some incorrect setting that is not compliant with
this lucene version? I'll keep investigating the subject but if there is
any additional information you can give me about these types of field cache
insanity warnings it would be really helpful.

On Thu, Sep 11, 2014 at 3:00 PM, Timothy Potter thelabd...@gmail.com
wrote:

 Probably need to look at it running with a profiler to see what's up.
 Here's a few additional flags that might help the GC work better for
 you (which is not to say there isn't a leak somewhere):

 -XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40

 This should lead to a nice up-and-down GC profile over time.

 On Thu, Sep 11, 2014 at 10:52 AM, Luis Carlos Guerrero
 lcguerreroc...@gmail.com wrote:
  hey guys,
 
  I'm running a solrcloud cluster consisting of five nodes. My largest
 index
  contains 2.5 million documents and occupies about 6 gigabytes of disk
  space. We recently switched to the latest solr version (4.10) from
 version
  4.4.1 which we ran successfully for about a year without any major
 issues.
  From the get go we started having memory problems caused by the CMS old
  heap usage being filled up incrementally. It starts out with a very low
  memory consumption and after 12 hours or so it ends up using up all
  available heap space. We thought it could be one of the caches we had
  configured, so we reduced our main core filter cache max size from 1024
 to
  512 elements. The only thing we accomplished was that the cluster ran
 for a
  longer time than before.
 
  I generated several heapdumps and basically what is filling up the heap
 is
  lucene's field cache. it gets bigger and bigger until it fills up all
  available memory.
 
  My jvm memory settings are the following:
 
  -Xms15g -Xmx15g -XX:PermSize=512m -XX:MaxPermSize=512m -XX:NewSize=5g
  -XX:MaxNewSize=5g
  -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:+PrintGCDateStamps
  -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError
 -XX:+UseConcMarkSweepGC
  What's weird to me is that we didn't have this problem before, I'm
 thinking
  this is some kind of memory leak issue present in the new lucene. We ran
  our old cluster for several weeks at a time without having to redeploy
  because of config changes or other reasons. Was there some issue reported
  related to elevated memory consumption by the field cache?
 
  any help would be greatly appreciated.
 
  regards,
 
  --
  Luis Carlos Guerrero
  about.me/luis.guerrero




-- 
Luis Carlos Guerrero
about.me/luis.guerrero


Re: Append children documents for nested document

2014-09-16 Thread bradhill99
Anyone can help me about this? or Solr not support adding additional children
documents to existed parent document?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Append-children-documents-for-nested-document-tp4157087p4159152.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mongo DB Users

2014-09-16 Thread Siegfried Goeschl

remove please

On 16.09.14 15:42, Karolina Dobromiła Jeleń wrote:

remove please

On Tue, Sep 16, 2014 at 9:35 AM, Amey Patil amey.pa...@germin8.com wrote:


Remove.

On Tue, Sep 16, 2014 at 12:58 PM, Joan joan.monp...@gmail.com wrote:


Remove please

2014-09-16 6:59 GMT+02:00 Patti Kelroe-Cooke patt...@gmail.com:


Remove

Kind regards
Patti

On Mon, Sep 15, 2014 at 5:35 PM, Aaron Susan aaronsus...@gmail.com
wrote:


Hi,

I am here to inform you that we are having a contact list of *Mongo

DB

Users *would you be interested in it?

Data Field’s Consist Of: Name, Job Title, Verified Phone Number,

Verified

Email Address, Company Name  Address Employee Size, Revenue size,

SIC

Code, Industry Type etc.,

We also provide other technology users as well depends on your

requirement.


For Example:


*Red Hat *

*Terra data *

*Net-app *

*NuoDB*

*MongoHQ ** and many more*


We also provide IT Decision Makers, Sales and Marketing Decision

Makers,

C-level Titles and other titles as per your requirement.

Please review and let me know your interest if you are looking for

above

mentioned users list or other contacts list for your campaigns.

Waiting for a positive response!

Thanks

*Aaron Susan*
Data Specialist

If you are not the right person, feel free to forward this email to

the

right person in your organization. To opt out response Remove















Re: Mongo DB Users

2014-09-16 Thread Suman Ghosh
Remove

On Mon, Sep 15, 2014 at 11:35 AM, Aaron Susan aaronsus...@gmail.com wrote:

 Hi,

 I am here to inform you that we are having a contact list of *Mongo DB
 Users *would you be interested in it?

 Data Field’s Consist Of: Name, Job Title, Verified Phone Number, Verified
 Email Address, Company Name  Address Employee Size, Revenue size, SIC
 Code, Industry Type etc.,

 We also provide other technology users as well depends on your requirement.

 For Example:


 *Red Hat *

 *Terra data *

 *Net-app *

 *NuoDB*

 *MongoHQ ** and many more*


 We also provide IT Decision Makers, Sales and Marketing Decision Makers,
 C-level Titles and other titles as per your requirement.

 Please review and let me know your interest if you are looking for above
 mentioned users list or other contacts list for your campaigns.

 Waiting for a positive response!

 Thanks

 *Aaron Susan*
 Data Specialist

 If you are not the right person, feel free to forward this email to the
 right person in your organization. To opt out response Remove



RE: Apache Solr license Cost

2014-09-16 Thread nitin.kumar.gupta
Hi Team - I want to recommend the Apache Solr - Enterprise Search engineer for 
one of our client. Could you please send the license/support cost  features of 
the product?

Rgds,
Nitin Kumar Gupta
Accenture Technology - IDC
3rd to 5th floor, Tower-B, SP Infocity, Plot No. 243,
Udyog Vihar, Phase-1, Gurgaon - 122016
Mobile: +91 9811208895
E-mail: nitin.kumar.gu...@accenture.commailto:nitin.kumar.gu...@accenture.com
[Description: untitled5]




This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


Performance of Unsorted Queries

2014-09-16 Thread Ilya Bernshteyn
If I query for IDs and I do not care about order, should I still expect
better performance paging the results? (e.g. rows=1000 or rows=1) The
use case is that I need to get all of the IDs regardless (there will be
thousands, maybe 10s of thousands, but not millions)

Example query:

http://domain/solr/select?q=ACCT_ID%3A1153fq=SOME_FIELD%3SomeKeyword%2C+SOME_FIELD_2%3ASomeKeywordrows=1fl=IDwt=json

With this kind of query, I notice that rows=10 returns in 5ms, while
rows=1 (producing about 7000 results) returns in about 500ms.

Another way to word my question, if I have 100k not ordered IDs to
retrieve, is performance better getting 1k at a time or all 100k at the
same time?

Thanks,

Ilya


Re: Performance of Unsorted Queries

2014-09-16 Thread Michael Della Bitta
Performance would be better getting them all at the same time, but the
behavior would kind of stink (long pause before a response, big results
stuck in memory, etc).

If you're using a relatively up-to-date version of Solr, you should check
out the cursormark feature:
https://wiki.apache.org/solr/CommonQueryParameters#Deep_paging_with_cursorMark

That's the magic knock that will get you what you want.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/

On Tue, Sep 16, 2014 at 11:03 AM, Ilya Bernshteyn ily...@gmail.com wrote:

 If I query for IDs and I do not care about order, should I still expect
 better performance paging the results? (e.g. rows=1000 or rows=1) The
 use case is that I need to get all of the IDs regardless (there will be
 thousands, maybe 10s of thousands, but not millions)

 Example query:


 http://domain/solr/select?q=ACCT_ID%3A1153fq=SOME_FIELD%3SomeKeyword%2C+SOME_FIELD_2%3ASomeKeywordrows=1fl=IDwt=json

 With this kind of query, I notice that rows=10 returns in 5ms, while
 rows=1 (producing about 7000 results) returns in about 500ms.

 Another way to word my question, if I have 100k not ordered IDs to
 retrieve, is performance better getting 1k at a time or all 100k at the
 same time?

 Thanks,

 Ilya



Re: Performance of Unsorted Queries

2014-09-16 Thread Jürgen Wagner (DVT)
Depending on the size of the individual records returned, I'd use a
decent size window (to minimize network and marshalling/unmarshalling
overhead) of maybe 1000-1 items sorted by id, and use that in
combination with cursorMark. That will be easier on the server side in
terms of garbage collection.

Best regards,
--Jürgen

On 16.09.2014 17:03, Ilya Bernshteyn wrote:
 If I query for IDs and I do not care about order, should I still expect
 better performance paging the results? (e.g. rows=1000 or rows=1) The
 use case is that I need to get all of the IDs regardless (there will be
 thousands, maybe 10s of thousands, but not millions)

 Example query:

 http://domain/solr/select?q=ACCT_ID%3A1153fq=SOME_FIELD%3SomeKeyword%2C+SOME_FIELD_2%3ASomeKeywordrows=1fl=IDwt=json

 With this kind of query, I notice that rows=10 returns in 5ms, while
 rows=1 (producing about 7000 results) returns in about 500ms.

 Another way to word my question, if I have 100k not ordered IDs to
 retrieve, is performance better getting 1k at a time or all 100k at the
 same time?

 Thanks,

 Ilya



-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center Intelligence
 Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wag...@devoteam.com
mailto:juergen.wag...@devoteam.com, URL: www.devoteam.de
http://www.devoteam.de/


Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071




RE: solr query gives different numFound upon refreshing

2014-09-16 Thread Joshi, Shital
We wrote a script which queries each Solr instance in cloud 
(http://$host/solr/replication?command=details) and subtracts the 
‘replicableVersion’ number from the ‘indexVersion’ number, converts to minutes, 
and alerts if the minutes exceed 20. We get alerted many times a day. The soft 
commit setting is every 7 minutes. 

Any idea what might be wrong here?

This is our commit setting. 

autoCommit
   maxTime15000/maxTime
   maxDocs10/maxDocs
   openSearcherfalse/openSearcher   
 /autoCommit
 autoSoftCommit 
   maxTime45/maxTime
/autoSoftCommit

We got rid of all max new searcher errors. 


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, September 04, 2014 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: solr query gives different numFound upon refreshing

Does this persist if you issue a hard commit? You can do something like
http://solr/collection/update?stream.body=commit/

On Thu, Sep 4, 2014 at 2:19 PM, shamik sham...@gmail.com wrote:
 I've noticed similar behavior with our Solr cloud cluster for a while, it's
 random though. We've 2 shards with 3 replicas each. At times, I've observed
 that the same query on refresh will fetch different results (numFound) as
 well as the content. The only way to mitigate is to refresh the index with
 the documents till the nodes are in sync. I always use SolrJ which talks to
 Solr through zookeeper, even with that it seemed to be unavoidable at times.
 We are committing every 10 mins. I'm pretty much sure there's a minor glitch
 which creates a sync issue at times.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-query-gives-different-numFound-upon-refreshing-tp4155414p4157026.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Apache Solr license Cost

2014-09-16 Thread Alexandre Rafalovitch
Are you asking about the consultant or about the product itself? The
product itself is free and open source, unless you want to get one of the
several commercial distributions. In later case, you may want to reach out
to their sales team directly.

If you are looking for a consultant/company to support Solr, you may want
to be more specific about areas of expertise, client's location, etc.
And/or check the wiki that has a consultants list (somewhat out of date).

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On 16 September 2014 10:37, nitin.kumar.gu...@accenture.com wrote:

  Hi Team – I want to recommend the Apache Solr – Enterprise Search
 engineer for one of our client. Could you please send the license/support
 cost  features of the product?



 Rgds,

 *Nitin Kumar Gupta*

 Accenture Technology - IDC

 3rd to 5th floor, Tower-B, SP Infocity, Plot No.
 243,


 Udyog Vihar, Phase-1, Gurgaon - 122016

 Mobile: +91 9811208895

 E-mail: nitin.kumar.gu...@accenture.com

 [image: Description: untitled5]



 --

 This message is for the designated recipient only and may contain
 privileged, proprietary, or otherwise confidential information. If you have
 received it in error, please notify the sender immediately and delete the
 original. Any other use of the e-mail by you is prohibited. Where allowed
 by local law, electronic communications with Accenture and its affiliates,
 including e-mail and instant messaging (including content), may be scanned
 by our systems for the purposes of information security and assessment of
 internal compliance with Accenture policy.

 __

 www.accenture.com



Re: solr/lucene 4.10 out of memory issues

2014-09-16 Thread Luis Carlos Guerrero
I checked and these 'insanity' cached keys correspond to fields we use for
both grouping and faceting. The same behavior is documented here:
https://issues.apache.org/jira/browse/SOLR-4866, although I have single
shards for every replica which the jira says is a setup which should not
generate these issues.

What I don't get is why the cluster was running fine with solr 4.4,
although double checking I was using LUCENE_40 as the match version. If I
use this match version in my current running 4.10 cluster will it make a
difference, or will I experience more issues than if I just roll back to
4.4 with LUCENE_40 match version? The problem in the end is that the
fieldcache grows unlimitedly. I'm thinking its because of the insanity
entries but I'm not really sure. It seem like a really big problem to leave
unattended or is the use case for faceting and grouping on the same field
not that common?

On Tue, Sep 16, 2014 at 11:06 AM, Luis Carlos Guerrero 
lcguerreroc...@gmail.com wrote:

 Thanks for the response, I've been working on solving some of the most
 evident issues and I also added your garbage collector parameters. First of
 all the Lucene field cache is being filled with some entries which are
 marked as 'insanity'. Some of these were related to a custom field that we
 use for our ranking. We fixed our custom plugin classes so that we wouldn't
 see any entries related to those fields there, but it seems there are other
 related problems with the field cache. Mainly the cache is being filled
 with these types of insanity entries:

 'SUBREADER: Found caches for descendants of StandardDirectoryReader'

 They are all related to standard solr fields. Could it be that our current
 schemas and configs have some incorrect setting that is not compliant with
 this lucene version? I'll keep investigating the subject but if there is
 any additional information you can give me about these types of field cache
 insanity warnings it would be really helpful.

 On Thu, Sep 11, 2014 at 3:00 PM, Timothy Potter thelabd...@gmail.com
 wrote:

 Probably need to look at it running with a profiler to see what's up.
 Here's a few additional flags that might help the GC work better for
 you (which is not to say there isn't a leak somewhere):

 -XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40

 This should lead to a nice up-and-down GC profile over time.

 On Thu, Sep 11, 2014 at 10:52 AM, Luis Carlos Guerrero
 lcguerreroc...@gmail.com wrote:
  hey guys,
 
  I'm running a solrcloud cluster consisting of five nodes. My largest
 index
  contains 2.5 million documents and occupies about 6 gigabytes of disk
  space. We recently switched to the latest solr version (4.10) from
 version
  4.4.1 which we ran successfully for about a year without any major
 issues.
  From the get go we started having memory problems caused by the CMS old
  heap usage being filled up incrementally. It starts out with a very low
  memory consumption and after 12 hours or so it ends up using up all
  available heap space. We thought it could be one of the caches we had
  configured, so we reduced our main core filter cache max size from 1024
 to
  512 elements. The only thing we accomplished was that the cluster ran
 for a
  longer time than before.
 
  I generated several heapdumps and basically what is filling up the heap
 is
  lucene's field cache. it gets bigger and bigger until it fills up all
  available memory.
 
  My jvm memory settings are the following:
 
  -Xms15g -Xmx15g -XX:PermSize=512m -XX:MaxPermSize=512m -XX:NewSize=5g
  -XX:MaxNewSize=5g
  -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:+PrintGCDateStamps
  -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError
 -XX:+UseConcMarkSweepGC
  What's weird to me is that we didn't have this problem before, I'm
 thinking
  this is some kind of memory leak issue present in the new lucene. We ran
  our old cluster for several weeks at a time without having to redeploy
  because of config changes or other reasons. Was there some issue
 reported
  related to elevated memory consumption by the field cache?
 
  any help would be greatly appreciated.
 
  regards,
 
  --
  Luis Carlos Guerrero
  about.me/luis.guerrero




 --
 Luis Carlos Guerrero
 about.me/luis.guerrero




-- 
Luis Carlos Guerrero
about.me/luis.guerrero


Re: Mongo DB Users

2014-09-16 Thread Xavier Morera
I think what some people are actually saying is burn in hell Aaron Susan
for using a solr apache dl for marketing purposes?

On Tue, Sep 16, 2014 at 8:31 AM, Suman Ghosh suman.ghos...@gmail.com
wrote:

 Remove

 On Mon, Sep 15, 2014 at 11:35 AM, Aaron Susan aaronsus...@gmail.com
 wrote:

  Hi,
 
  I am here to inform you that we are having a contact list of *Mongo DB
  Users *would you be interested in it?
 
  Data Field’s Consist Of: Name, Job Title, Verified Phone Number, Verified
  Email Address, Company Name  Address Employee Size, Revenue size, SIC
  Code, Industry Type etc.,
 
  We also provide other technology users as well depends on your
 requirement.
 
  For Example:
 
 
  *Red Hat *
 
  *Terra data *
 
  *Net-app *
 
  *NuoDB*
 
  *MongoHQ ** and many more*
 
 
  We also provide IT Decision Makers, Sales and Marketing Decision Makers,
  C-level Titles and other titles as per your requirement.
 
  Please review and let me know your interest if you are looking for above
  mentioned users list or other contacts list for your campaigns.
 
  Waiting for a positive response!
 
  Thanks
 
  *Aaron Susan*
  Data Specialist
 
  If you are not the right person, feel free to forward this email to the
  right person in your organization. To opt out response Remove
 




-- 
*Xavier Morera*
email: xav...@familiamorera.com
CR: +(506) 8849 8866
US: +1 (305) 600 4919
skype: xmorera


Solr mapred MTree merge stage hangs repeatably in 4.10 (but not 4.9)

2014-09-16 Thread Brett Hoerner
I have a very weird problem that I'm going to try to describe here to see
if anyone has any ah-ha moments or clues. I haven't created a small
reproducible project for this but I guess I will have to try in the future
if I can't figure it out. (Or I'll need to bisect by running long Hadoop
jobs...)

So, the facts:

* Have been successfully using Solr mapred to build very large Solr
clusters for months
* As of Solr 4.10 *some* job sizes repeatably hang in the MTree merge phase
in 4.10
* Those same jobs (same input, output, and Hadoop cluster itself) succeed
if I only change my Solr deps to 4.9
* The job *does succeed* in 4.10 if I use the same data to create more, but
smaller shards (e.g. 12x as many shards each 1/12th the size of the job
that fails)
* Creating my normal size shards (the size I want, that works in 4.9) the
job hangs with 2 mappers running, 0 reducers in the MTree merge phase
* There are no errors or warning in the syslog/stderr of the MTree mappers,
no errors ever echo'd back to the interactive run of the job (mapper says
100%, reduce says 0%, will stay forever)
* No CPU being used on the boxes running the merge, no GC happening, JVM
waiting on a futex, all threads blocked on various queues
* No disk usage problems, nothing else obviously wrong with any box in the
cluster

I diff'ed around between 4.10 and 4.9 and barely see any changes in mapred
contrib, mostly some test stuff. I didn't see any transitive dependency
changes in Solr/Lucene that look like they would affect me.


Access solr cloud via ssh tunnel?

2014-09-16 Thread Michael Joyner

I am in a situation where I need to access a solrcloud behind a firewall.

I have a tunnel enabled to one of the zookeeper as a starting points and 
the following test code:


CloudSolrServer server = new CloudSolrServer(localhost:2181);
server.setDefaultCollection(test);
SolrPingResponse p = server.ping();
System.out.println(p.getRequestUrl());

Right now it just hangs without any errors... what additional ports 
need forwarding and other configurations need setting to access a 
solrcloud over a ssh tunnel or tunnels?


Re: Access solr cloud via ssh tunnel?

2014-09-16 Thread Doug Balog
Not sure if this will work, but try to use ssh to setup a SOCKS proxy via
the  -D  command option.
Then use the socksProxyHost and socksProxyPort via the java command line
(ie java -DsocksProxyHost=localhost)  or
System.setProperty(socksProxyHost,localhost) from your code. Make sure
to specify both the host and the port.
See
http://docs.oracle.com/javase/7/docs/api/java/net/doc-files/net-properties.html



On Tue, Sep 16, 2014 at 3:25 PM, Michael Joyner mich...@newsrx.com wrote:

 I am in a situation where I need to access a solrcloud behind a firewall.

 I have a tunnel enabled to one of the zookeeper as a starting points and
 the following test code:

 CloudSolrServer server = new CloudSolrServer(localhost:2181);
 server.setDefaultCollection(test);
 SolrPingResponse p = server.ping();
 System.out.println(p.getRequestUrl());

 Right now it just hangs without any errors... what additional ports need
 forwarding and other configurations need setting to access a solrcloud over
 a ssh tunnel or tunnels?



Re: Mongo DB Users

2014-09-16 Thread Fermin Silva
Remove

On Tue, Sep 16, 2014 at 2:19 PM, Xavier Morera xav...@familiamorera.com
wrote:

 I think what some people are actually saying is burn in hell Aaron Susan
 for using a solr apache dl for marketing purposes?

 On Tue, Sep 16, 2014 at 8:31 AM, Suman Ghosh suman.ghos...@gmail.com
 wrote:

  Remove
 
  On Mon, Sep 15, 2014 at 11:35 AM, Aaron Susan aaronsus...@gmail.com
  wrote:
 
   Hi,
  
   I am here to inform you that we are having a contact list of *Mongo DB
   Users *would you be interested in it?
  
   Data Field’s Consist Of: Name, Job Title, Verified Phone Number,
 Verified
   Email Address, Company Name  Address Employee Size, Revenue size, SIC
   Code, Industry Type etc.,
  
   We also provide other technology users as well depends on your
  requirement.
  
   For Example:
  
  
   *Red Hat *
  
   *Terra data *
  
   *Net-app *
  
   *NuoDB*
  
   *MongoHQ ** and many more*
  
  
   We also provide IT Decision Makers, Sales and Marketing Decision
 Makers,
   C-level Titles and other titles as per your requirement.
  
   Please review and let me know your interest if you are looking for
 above
   mentioned users list or other contacts list for your campaigns.
  
   Waiting for a positive response!
  
   Thanks
  
   *Aaron Susan*
   Data Specialist
  
   If you are not the right person, feel free to forward this email to the
   right person in your organization. To opt out response Remove
  
 



 --
 *Xavier Morera*
 email: xav...@familiamorera.com
 CR: +(506) 8849 8866
 US: +1 (305) 600 4919
 skype: xmorera



Re: Access solr cloud via ssh tunnel?

2014-09-16 Thread Jürgen Wagner (DVT)
In a test scenario, I used stunnel for connections between some
zookeeper observers and the central ensemble, as well as between a SolrJ
4.9.0 client and the central zookeepers. This is entirely transparent
modulo performance penalties due to network latency and ssl overhead. I
finally ended up with placing the observer node close to the SolrJ client.

Depending on what kind of network connection is between the SolrJ client
and the cluster, you may run into TCP MTU issues or packet fragmentation
problems. Hard to say what's happening without knowing any details on
the nature of the tunnel.

Try testing some four-letter commands from the SolrJ client machine,
e.g. echo ruok | nc localhost 2181. Does that work?

Best regards,
--Jürgen

On 16.09.2014 21:25, Michael Joyner wrote:
 I am in a situation where I need to access a solrcloud behind a firewall.

 I have a tunnel enabled to one of the zookeeper as a starting points
 and the following test code:

 CloudSolrServer server = new CloudSolrServer(localhost:2181);
 server.setDefaultCollection(test);
 SolrPingResponse p = server.ping();
 System.out.println(p.getRequestUrl());

 Right now it just hangs without any errors... what additional ports
 need forwarding and other configurations need setting to access a
 solrcloud over a ssh tunnel or tunnels?


-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center Intelligence
 Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wag...@devoteam.com
mailto:juergen.wag...@devoteam.com, URL: www.devoteam.de
http://www.devoteam.de/


Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071




Solr 4.10 termsIndexInterval and termsIndexDivisor not supported with default PostingsFormat?

2014-09-16 Thread Tom Burton-West
Hello,

I think the documentation and example files for Solr 4.x need to be
updated.  If someone will let me know I'll be happy to fix the example
and perhaps someone with edit rights could fix the reference guide.

Due to dirty OCR and over 400 languages we have over 2 billion unique
terms in our index.  In Solr 3.6 we set termIndexInterval to 1024 (8
times the default of 128) to reduce the size of the in-memory index.
Previously we used termIndexDivisor for a similar purpose.

We suspect that in Solr 4.10 (and probably previous Solr 4.x versions)
termIndexInterval and termIndexDivisor do not apply to the default
codec and are probably unnecessary (since the default terms index now
uses a much more efficient representation).

According to the JavaDocs for IndexWriterConfig, the Lucene level
implementations of these do not apply to the default PostingsFormat
implementation.
http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/index/IndexWriterConfig.html#setReaderTermsIndexDivisor%28int%29

Despite this statement in the Lucene JavaDocs, in the
example/solrconfig.xml there is the following:

!-- Expert: Controls how often Lucene loads terms into memory
278 Default is 128 and is likely good for most everyone.
279 --
280 !-- termIndexInterval128/termIndexInterval --

In the 4.10 reference manual page 365 there is also an example showing
the termIndexInterval.

Can someone please confirm that these two parameter settings
termIndexInterval and termsIndexDivisor, do not apply to the default
PostingsFormat for Solr 4.10?

Tom


How to preserve 0 after decimal point?

2014-09-16 Thread bbarani
I have a requirement to preserve 0 after decimal point, currently with the
below field type 

 fieldType class=solr.SortableFloatField name=sfloat omitNorms=true
sortMissingLast=true/

27.50 is stripped as 27.5
27.00 is stripped as 27.0
27.90 is stripped as 29.9

float name=Price27.5/float

I also tried using double but even then the 0's are getting stripped.

double name=Price27.5/double

Input data:

field name=Price27.50/field 








--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-preserve-0-after-decimal-point-tp4159295.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Access solr cloud via ssh tunnel?

2014-09-16 Thread rulinma
firewall effect this, try and test. good luck.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Access-solr-cloud-via-ssh-tunnel-tp4159224p4159305.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to preserve 0 after decimal point?

2014-09-16 Thread Erick Erickson
Whoa! First, you should really NOT be using sfloat (or pfloat) or any
of their variants unless you're waaay back on 1.4. Those were fine in
their time, but numeric types (float/tfloat and the rest) are vastly
preferred. Also more efficient in terms of CPU cycles and storage.

second, and assuming your problem is really that you're looking at the
_display_, you should get back exactly what you put in so I'm guessing
pilot error here.

But you need to provide more details, especially the Solr version and
how the output is displayed.

Best,
Erick

On Tue, Sep 16, 2014 at 5:15 PM, bbarani bbar...@gmail.com wrote:
 I have a requirement to preserve 0 after decimal point, currently with the
 below field type

  fieldType class=solr.SortableFloatField name=sfloat omitNorms=true
 sortMissingLast=true/

 27.50 is stripped as 27.5
 27.00 is stripped as 27.0
 27.90 is stripped as 29.9

 float name=Price27.5/float

 I also tried using double but even then the 0's are getting stripped.

 double name=Price27.5/double

 Input data:

 field name=Price27.50/field








 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-to-preserve-0-after-decimal-point-tp4159295.html
 Sent from the Solr - User mailing list archive at Nabble.com.


MaxScore

2014-09-16 Thread William Bell
What we need is a function like scale(field,min,max) but only operates on
the results that come back from the search results.

scale() takes the min, max from the field in the index, not necessarily
those in the results.

I cannot think of a solution. max() only looks at one field, not across
fields in the results.

I tried a query() but cannot think of a way to get the max value of a field
ONLY in the results...

Ideas?


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076