[jira] (METRON-678) Multithread the flat file loader

2017-01-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/METRON-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847439#comment-15847439
 ] 

ASF GitHub Bot commented on METRON-678:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-metron/pull/428


> Multithread the flat file loader
> 
>
> Key: METRON-678
> URL: https://issues.apache.org/jira/browse/METRON-678
> Project: Metron
>  Issue Type: Improvement
>Reporter: Casey Stella
>Assignee: Casey Stella
>
> Currently the flat file loader is single threaded in its writing to HBase.  
> We could make this a lot faster by multithreading the HBase puts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] (METRON-678) Multithread the flat file loader

2017-01-30 Thread ASF GitHub Bot (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 ASF GitHub Bot commented on  METRON-678 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Multithread the flat file loader  
 
 
 
 
 
 
 
 
 
 
Github user cestella commented on a diff in the pull request: 
 https://github.com/apache/incubator-metron/pull/428#discussion_r98559714 
 — Diff: metron-platform/metron-common/src/test/java/org/apache/metron/common/utils/file/ReaderSpliteratorTest.java — @@ -0,0 +1,185 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.common.utils.file; + +import org.adrianwalker.multilinestring.Multiline; +import org.junit.Assert; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.io.*; +import java.nio.file.Files; +import java.nio.file.OpenOption; +import java.nio.file.StandardOpenOption; +import java.util.Map; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ForkJoinPool; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +public class ReaderSpliteratorTest { + /** + foo + bar + grok + foo + the + and + grok + foo + bar + */ + @Multiline + public static String data; + public static final File dataFile = new File("target/readerspliteratortest.data"); + + @BeforeClass + public static void setup() throws IOException { + if(dataFile.exists())  { + dataFile.delete(); + } 
 + Files.write(dataFile.toPath(), data.getBytes(), StandardOpenOption.CREATE_NEW, StandardOpenOption.TRUNCATE_EXISTING); + dataFile.deleteOnExit(); + } + + public static BufferedReader getReader() throws FileNotFoundException  { + return new BufferedReader(new FileReader(dataFile)); + } 
 + + @Test + public void testParallelStreamSmallBatch() throws FileNotFoundException { + try( Stream stream = ReaderSpliterator.lineStream(getReader(), 2))  { + + Map count = + stream.parallel().map( s -> s.trim()) + .collect(Collectors.toMap(s -> s, s -> 1, Integer::sum)); + Assert.assertEquals(5, count.size()); + Assert.assertEquals(3, (int)count.get("foo")); + Assert.assertEquals(2, (int)count.get("bar")); + Assert.assertEquals(1, (int)count.get("and")); + Assert.assertEquals(1, (int)count.get("the")); + } 
 + } + + @Test + public void testParallelStreamLargeBatch() throws FileNotFoundException { + try( Stream stream = ReaderSpliterator.lineStream(getReader(), 100)) { + Map count = + stream.parallel().map(s -> s.trim()) + .collect(Collectors.toMap(s -> s, s -> 1, Integer::sum)); + Assert.assertEquals(5, count.size()); + Assert.assertEquals(3, (int) count.get("foo")); + Assert.assertEquals(2, (int) count.get("bar")); — End diff – 
 hah, no, not intentionally. I can add it in. 
 
 
 
 
 
 
 
 

[jira] (METRON-678) Multithread the flat file loader

2017-01-30 Thread ASF GitHub Bot (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 ASF GitHub Bot commented on  METRON-678 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Multithread the flat file loader  
 
 
 
 
 
 
 
 
 
 
Github user mmiklavc commented on the issue: 
 https://github.com/apache/incubator-metron/pull/428 
 Manual tests all checked out. Code looks good to me. The parallelism tests were a nice addition. +1 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)