Re: [Neo4j] Neo4j with MapReduce inserts
Hello Sulabh, We're going to need a little more information before we can help. Can you tell us how it fails? Are you trying to run a batch inserter on different databases on each of your parallel jobs? Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j with MapReduce inserts
Also, what technology are you writing those map-reduce jobs with ? (framework, runtime-env, etc). Some code samples would be great as well. Cheers Michael Am 17.06.2011 um 22:24 schrieb Jim Webber: Hello Sulabh, We're going to need a little more information before we can help. Can you tell us how it fails? Are you trying to run a batch inserter on different databases on each of your parallel jobs? Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j with MapReduce inserts
Well as I mentioned the code does not fail anywhere, it runs it full course and just skips the writing to the graph part. I have just one graph and I pass just 1 instance of the batchInserter to the map function. My code is in Scala, sample code attached below class ExportReducer extends Reducer[Text,MapWritable,LongWritable,Text]{ type Context = org.apache.hadoop.mapreduce.Reducer[Text, MapWritable, LongWritable, Text]#Context @throws(classOf[Exception]) override def reduce(key: Text, value: java.lang.Iterable[MapWritable], context: Context) { var keys: Array[String] = key.toString.split(:) var uri1 = first + keys(0) var uri2 = last + keys(1) ExportReducerObject.propertiesUID.put(ID,uri1); var node1 = ExportReducerObject.batchInserter.createNode(ExportReducerObject.propertiesUID); ExportReducerObject.indexService.add(node1,ExportReducerObject.propertiesUID) ExportReducerObject.propertiesCID.put(ID,uri2); var node2 = ExportReducerObject.batchInserter.createNode(ExportReducerObject.propertiesCID); ExportReducerObject.indexService.add(node2,ExportReducerObject.propertiesCID); ExportReducerObject.propertiesEdges.put(fullName,1.0); ExportReducerObject.batchInserter.createRelationship(node1,node2,DynamicRelationshipType.withName( fullName),ExportReducerObject.propertiesEdges) } My graph properties are defined as below :- val batchInserter = new BatchInserterImpl(graph, BatchInserterImpl.loadProperties(neo4j.props)) val indexProvider = new LuceneBatchInserterIndexProvider(batchInserter) val indexService = indexProvider.nodeIndex(ID,MapUtil.stringMap(type,exact)) Mind it that the code works perfectly( writes to the graph) when running in local mode. On Fri, Jun 17, 2011 at 11:32 AM, sulabh choudhury sula...@gmail.comwrote: I am trying to write MapReduce job to do Neo4j Batchinserters. It works fine when I just run it like a java file(runs in local mode) and does the insert, but when I try to run it in the distributed mode it does not write to the graph. Is it issue related to permissions? I have no clue where to look. -- -- Thanks and Regards, Sulabh Choudhury ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j with MapReduce inserts
Hi Sulabh, what do you mean by 'local' mode? The batch inserter can only be used in a single threaded environment. You shouldn't use it in a concurrent env as it will fail unpredictably. Please use the EmbeddedGraphDatabase instead. Michael Am 17.06.2011 um 23:20 schrieb sulabh choudhury: Well as I mentioned the code does not fail anywhere, it runs it full course and just skips the writing to the graph part. I have just one graph and I pass just 1 instance of the batchInserter to the map function. My code is in Scala, sample code attached below class ExportReducer extends Reducer[Text,MapWritable,LongWritable,Text]{ type Context = org.apache.hadoop.mapreduce.Reducer[Text, MapWritable, LongWritable, Text]#Context @throws(classOf[Exception]) override def reduce(key: Text, value: java.lang.Iterable[MapWritable], context: Context) { var keys: Array[String] = key.toString.split(:) var uri1 = first + keys(0) var uri2 = last + keys(1) ExportReducerObject.propertiesUID.put(ID,uri1); var node1 = ExportReducerObject.batchInserter.createNode(ExportReducerObject.propertiesUID); ExportReducerObject.indexService.add(node1,ExportReducerObject.propertiesUID) ExportReducerObject.propertiesCID.put(ID,uri2); var node2 = ExportReducerObject.batchInserter.createNode(ExportReducerObject.propertiesCID); ExportReducerObject.indexService.add(node2,ExportReducerObject.propertiesCID); ExportReducerObject.propertiesEdges.put(fullName,1.0); ExportReducerObject.batchInserter.createRelationship(node1,node2,DynamicRelationshipType.withName(fullName),ExportReducerObject.propertiesEdges) } My graph properties are defined as below :- val batchInserter = new BatchInserterImpl(graph, BatchInserterImpl.loadProperties(neo4j.props)) val indexProvider = new LuceneBatchInserterIndexProvider(batchInserter) val indexService = indexProvider.nodeIndex(ID,MapUtil.stringMap(type,exact)) Mind it that the code works perfectly( writes to the graph) when running in local mode. On Fri, Jun 17, 2011 at 11:32 AM, sulabh choudhury sula...@gmail.com wrote: I am trying to write MapReduce job to do Neo4j Batchinserters. It works fine when I just run it like a java file(runs in local mode) and does the insert, but when I try to run it in the distributed mode it does not write to the graph. Is it issue related to permissions? I have no clue where to look. -- -- Thanks and Regards, Sulabh Choudhury ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j with MapReduce inserts
Are you referring that in a M/R environment each Map (or Reduce) process will try to have its own instance of batchInserter and hence it would fail ? WHen I say local I mean that the code works fine when I just use the M/R api but fails when I try to run in distributed mode. On Fri, Jun 17, 2011 at 2:25 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Hi Sulabh, what do you mean by 'local' mode? The batch inserter can only be used in a single threaded environment. You shouldn't use it in a concurrent env as it will fail unpredictably. Please use the EmbeddedGraphDatabase instead. Michael Am 17.06.2011 um 23:20 schrieb sulabh choudhury: Well as I mentioned the code does not fail anywhere, it runs it full course and just skips the writing to the graph part. I have just one graph and I pass just 1 instance of the batchInserter to the map function. My code is in Scala, sample code attached below class ExportReducer extends Reducer[Text,MapWritable,LongWritable,Text]{ type Context = org.apache.hadoop.mapreduce.Reducer[Text, MapWritable, LongWritable, Text]#Context @throws(classOf[Exception]) override def reduce(key: Text, value: java.lang.Iterable[MapWritable], context: Context) { var keys: Array[String] = key.toString.split(:) var uri1 = first + keys(0) var uri2 = last + keys(1) ExportReducerObject.propertiesUID.put(ID,uri1); var node1 = ExportReducerObject.batchInserter.createNode(ExportReducerObject.propertiesUID); ExportReducerObject.indexService.add(node1,ExportReducerObject.propertiesUID) ExportReducerObject.propertiesCID.put(ID,uri2); var node2 = ExportReducerObject.batchInserter.createNode(ExportReducerObject.propertiesCID); ExportReducerObject.indexService.add(node2,ExportReducerObject.propertiesCID); ExportReducerObject.propertiesEdges.put(fullName,1.0); ExportReducerObject.batchInserter.createRelationship(node1,node2,DynamicRelationshipType.withName(fullName),ExportReducerObject.propertiesEdges) } My graph properties are defined as below :- val batchInserter = new BatchInserterImpl(graph, BatchInserterImpl.loadProperties(neo4j.props)) val indexProvider = new LuceneBatchInserterIndexProvider(batchInserter) val indexService = indexProvider.nodeIndex(ID,MapUtil.stringMap(type,exact)) Mind it that the code works perfectly( writes to the graph) when running in local mode. On Fri, Jun 17, 2011 at 11:32 AM, sulabh choudhury sula...@gmail.comwrote: I am trying to write MapReduce job to do Neo4j Batchinserters. It works fine when I just run it like a java file(runs in local mode) and does the insert, but when I try to run it in the distributed mode it does not write to the graph. Is it issue related to permissions? I have no clue where to look. -- -- Thanks and Regards, Sulabh Choudhury -- -- Thanks and Regards, Sulabh Choudhury ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j with MapReduce inserts
No that would even be worse. A single BatchInserter and every graphdb-store that is currently written to by a batch inserter MUST be accessed from only a single single threaded environment. Please use the normal EmbeddedGraphDbService for your multi-threaded MR jobs. Cheers Michael Am 17.06.2011 um 23:38 schrieb sulabh choudhury: Are you referring that in a M/R environment each Map (or Reduce) process will try to have its own instance of batchInserter and hence it would fail ? WHen I say local I mean that the code works fine when I just use the M/R api but fails when I try to run in distributed mode. On Fri, Jun 17, 2011 at 2:25 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Hi Sulabh, what do you mean by 'local' mode? The batch inserter can only be used in a single threaded environment. You shouldn't use it in a concurrent env as it will fail unpredictably. Please use the EmbeddedGraphDatabase instead. Michael Am 17.06.2011 um 23:20 schrieb sulabh choudhury: Well as I mentioned the code does not fail anywhere, it runs it full course and just skips the writing to the graph part. I have just one graph and I pass just 1 instance of the batchInserter to the map function. My code is in Scala, sample code attached below class ExportReducer extends Reducer[Text,MapWritable,LongWritable,Text]{ type Context = org.apache.hadoop.mapreduce.Reducer[Text, MapWritable, LongWritable, Text]#Context @throws(classOf[Exception]) override def reduce(key: Text, value: java.lang.Iterable[MapWritable], context: Context) { var keys: Array[String] = key.toString.split(:) var uri1 = first + keys(0) var uri2 = last + keys(1) ExportReducerObject.propertiesUID.put(ID,uri1); var node1 = ExportReducerObject.batchInserter.createNode(ExportReducerObject.propertiesUID); ExportReducerObject.indexService.add(node1,ExportReducerObject.propertiesUID) ExportReducerObject.propertiesCID.put(ID,uri2); var node2 = ExportReducerObject.batchInserter.createNode(ExportReducerObject.propertiesCID); ExportReducerObject.indexService.add(node2,ExportReducerObject.propertiesCID); ExportReducerObject.propertiesEdges.put(fullName,1.0); ExportReducerObject.batchInserter.createRelationship(node1,node2,DynamicRelationshipType.withName(fullName),ExportReducerObject.propertiesEdges) } My graph properties are defined as below :- val batchInserter = new BatchInserterImpl(graph, BatchInserterImpl.loadProperties(neo4j.props)) val indexProvider = new LuceneBatchInserterIndexProvider(batchInserter) val indexService = indexProvider.nodeIndex(ID,MapUtil.stringMap(type,exact)) Mind it that the code works perfectly( writes to the graph) when running in local mode. On Fri, Jun 17, 2011 at 11:32 AM, sulabh choudhury sula...@gmail.com wrote: I am trying to write MapReduce job to do Neo4j Batchinserters. It works fine when I just run it like a java file(runs in local mode) and does the insert, but when I try to run it in the distributed mode it does not write to the graph. Is it issue related to permissions? I have no clue where to look. -- -- Thanks and Regards, Sulabh Choudhury -- -- Thanks and Regards, Sulabh Choudhury ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j with MapReduce inserts
Alright thank you all On Fri, Jun 17, 2011 at 2:46 PM, Michael Hunger michael.hun...@neotechnology.com wrote: No that would even be worse. A single BatchInserter and every graphdb-store that is currently written to by a batch inserter MUST be accessed from only a single single threaded environment. Please use the normal EmbeddedGraphDbService for your multi-threaded MR jobs. Cheers Michael Am 17.06.2011 um 23:38 schrieb sulabh choudhury: Are you referring that in a M/R environment each Map (or Reduce) process will try to have its own instance of batchInserter and hence it would fail ? WHen I say local I mean that the code works fine when I just use the M/R api but fails when I try to run in distributed mode. On Fri, Jun 17, 2011 at 2:25 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Hi Sulabh, what do you mean by 'local' mode? The batch inserter can only be used in a single threaded environment. You shouldn't use it in a concurrent env as it will fail unpredictably. Please use the EmbeddedGraphDatabase instead. Michael Am 17.06.2011 um 23:20 schrieb sulabh choudhury: Well as I mentioned the code does not fail anywhere, it runs it full course and just skips the writing to the graph part. I have just one graph and I pass just 1 instance of the batchInserter to the map function. My code is in Scala, sample code attached below class ExportReducer extends Reducer[Text,MapWritable,LongWritable,Text]{ type Context = org.apache.hadoop.mapreduce.Reducer[Text, MapWritable, LongWritable, Text]#Context @throws(classOf[Exception]) override def reduce(key: Text, value: java.lang.Iterable[MapWritable], context: Context) { var keys: Array[String] = key.toString.split(:) var uri1 = first + keys(0) var uri2 = last + keys(1) ExportReducerObject.propertiesUID.put(ID,uri1); var node1 = ExportReducerObject.batchInserter.createNode(ExportReducerObject.propertiesUID); ExportReducerObject.indexService.add(node1,ExportReducerObject.propertiesUID) ExportReducerObject.propertiesCID.put(ID,uri2); var node2 = ExportReducerObject.batchInserter.createNode(ExportReducerObject.propertiesCID); ExportReducerObject.indexService.add(node2,ExportReducerObject.propertiesCID); ExportReducerObject.propertiesEdges.put(fullName,1.0); ExportReducerObject.batchInserter.createRelationship(node1,node2,DynamicRelationshipType.withName(fullName),ExportReducerObject.propertiesEdges) } My graph properties are defined as below :- val batchInserter = new BatchInserterImpl(graph, BatchInserterImpl.loadProperties(neo4j.props)) val indexProvider = new LuceneBatchInserterIndexProvider(batchInserter) val indexService = indexProvider.nodeIndex(ID,MapUtil.stringMap(type,exact)) Mind it that the code works perfectly( writes to the graph) when running in local mode. On Fri, Jun 17, 2011 at 11:32 AM, sulabh choudhury sula...@gmail.comwrote: I am trying to write MapReduce job to do Neo4j Batchinserters. It works fine when I just run it like a java file(runs in local mode) and does the insert, but when I try to run it in the distributed mode it does not write to the graph. Is it issue related to permissions? I have no clue where to look. -- -- Thanks and Regards, Sulabh Choudhury -- -- Thanks and Regards, Sulabh Choudhury -- -- Thanks and Regards, Sulabh Choudhury ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user