[jira] [Updated] (CRUNCH-624) temporary table size is 0, which makes reducer number too small

2016-10-24 Thread Micah Whitacre (JIRA)

 [ 
https://issues.apache.org/jira/browse/CRUNCH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Whitacre updated CRUNCH-624:
--
Fix Version/s: 0.15.0

> temporary table size is 0, which makes reducer number too small
> ---
>
> Key: CRUNCH-624
> URL: https://issues.apache.org/jira/browse/CRUNCH-624
> Project: Crunch
>  Issue Type: Bug
>  Components: Core
>Reporter: JingChen
>Assignee: Josh Wills
> Fix For: 0.15.0
>
> Attachments: CRUNCH-624.patch
>
>
> if the pipeline produce temporary table , the reduce number of the temporary 
> table whose input table is temporary table may become very small in some 
> cases, since temporary table has no content .
> And, I may found the root cause in my case:
> {code:title=PCollectionImpl.java|borderStyle=solid}
> public void materializeAt(SourceTarget sourceTarget) {
>   this.materializedAt = sourceTarget;
>   this.size = materializedAt.getSize(getPipeline().getConfiguration());
> }
> @Override
> public long getSize() {
> if (size < 0) {
> this.size = getSizeInternal();
> }
> return size;
> }
> {code}
> PColletionImpl.materializeAt(sourceTarget) this method will be invoked when 
> node splits to create temporary table, source sourceTarget binds with the new 
> temporary table whose size is 0, since its path was just created, the 
> this.size will be 0. After that, when getSize() was invoked by setting reduce 
> number, since the size is 0, it will just return 0, which makes reduce number 
> too small.
> So i think the code of materializeAt() should check sourceTarget's size, like 
> below:
> {code:title=PCollectionImpl.java|borderStyle=solid}
> public void materializeAt(SourceTarget sourceTarget) {
>   this.materializedAt = sourceTarget;
>   long size = materializedAt.getSize(getPipeline().getConfiguration());
>   if (size > 0)
>   this.size = size;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CRUNCH-624) temporary table size is 0, which makes reducer number too small

2016-10-21 Thread Josh Wills (JIRA)

 [ 
https://issues.apache.org/jira/browse/CRUNCH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Wills updated CRUNCH-624:
--
Attachment: CRUNCH-624.patch

Patch for this.

> temporary table size is 0, which makes reducer number too small
> ---
>
> Key: CRUNCH-624
> URL: https://issues.apache.org/jira/browse/CRUNCH-624
> Project: Crunch
>  Issue Type: Bug
>  Components: Core
>Reporter: JingChen
>Assignee: Josh Wills
> Attachments: CRUNCH-624.patch
>
>
> if the pipeline produce temporary table , the reduce number of the temporary 
> table whose input table is temporary table may become very small in some 
> cases, since temporary table has no content .
> And, I may found the root cause in my case:
> {code:title=PCollectionImpl.java|borderStyle=solid}
> public void materializeAt(SourceTarget sourceTarget) {
>   this.materializedAt = sourceTarget;
>   this.size = materializedAt.getSize(getPipeline().getConfiguration());
> }
> @Override
> public long getSize() {
> if (size < 0) {
> this.size = getSizeInternal();
> }
> return size;
> }
> {code}
> PColletionImpl.materializeAt(sourceTarget) this method will be invoked when 
> node splits to create temporary table, source sourceTarget binds with the new 
> temporary table whose size is 0, since its path was just created, the 
> this.size will be 0. After that, when getSize() was invoked by setting reduce 
> number, since the size is 0, it will just return 0, which makes reduce number 
> too small.
> So i think the code of materializeAt() should check sourceTarget's size, like 
> below:
> {code:title=PCollectionImpl.java|borderStyle=solid}
> public void materializeAt(SourceTarget sourceTarget) {
>   this.materializedAt = sourceTarget;
>   long size = materializedAt.getSize(getPipeline().getConfiguration());
>   if (size > 0)
>   this.size = size;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CRUNCH-624) temporary table size is 0, which makes reducer number too small

2016-10-18 Thread JingChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CRUNCH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JingChen updated CRUNCH-624:

Description: 
if the pipeline produce temporary table , the reduce number of the temporary 
table whose input table is temporary table may become very small in some cases, 
since temporary table has no content .

And, I may found the root cause in my case:
{code:title=PCollectionImpl.java|borderStyle=solid}
public void materializeAt(SourceTarget sourceTarget) {
  this.materializedAt = sourceTarget;
  this.size = materializedAt.getSize(getPipeline().getConfiguration());
}

@Override
public long getSize() {
if (size < 0) {
this.size = getSizeInternal();
}
return size;
}
{code}
PColletionImpl.materializeAt(sourceTarget) this method will be invoked when 
node splits to create temporary table, source sourceTarget binds with the new 
temporary table whose size is 0, since its path was just created, the this.size 
will be 0. After that, when getSize() was invoked by setting reduce number, 
since the size is 0, it will just return 0, which makes reduce number too small.

So i think the code of materializeAt() should check sourceTarget's size, like 
below:
{code:title=PCollectionImpl.java|borderStyle=solid}
public void materializeAt(SourceTarget sourceTarget) {
  this.materializedAt = sourceTarget;
  long size = materializedAt.getSize(getPipeline().getConfiguration());
  if (size > 0)
  this.size = size;
}
{code}

  was:
if the pipeline produce temporary table , the reduce number of the temporary 
table whose input table is temporary table may become very small in some cases, 
since temporary table has no content .

And, I may found the root cause in my case:
{code:title=Bar.java|borderStyle=solid}
public void materializeAt(SourceTarget sourceTarget) {
  this.materializedAt = sourceTarget;
  this.size = materializedAt.getSize(getPipeline().getConfiguration());
}

@Override
public long getSize() {
if (size < 0) {
this.size = getSizeInternal();
}
return size;
}
{code}
PColletionImpl.materializeAt(sourceTarget) this method will be invoked when 
node splits to create temporary table, source sourceTarget binds with the new 
temporary table whose size is 0, since its path was just created, the this.size 
will be 0. After that, when getSize() was invoked by setting reduce number, 
since the size is 0, it will just return 0, which makes reduce number too small.

So i think the code of materializeAt() should check sourceTarget's size, like 
below:
{code:title=Bar.java|borderStyle=solid}
public void materializeAt(SourceTarget sourceTarget) {
  this.materializedAt = sourceTarget;
  long size = materializedAt.getSize(getPipeline().getConfiguration());
  if (size > 0)
  this.size = size;
}
{code}


> temporary table size is 0, which makes reducer number too small
> ---
>
> Key: CRUNCH-624
> URL: https://issues.apache.org/jira/browse/CRUNCH-624
> Project: Crunch
>  Issue Type: Bug
>  Components: Core
>Reporter: JingChen
>Assignee: Josh Wills
>
> if the pipeline produce temporary table , the reduce number of the temporary 
> table whose input table is temporary table may become very small in some 
> cases, since temporary table has no content .
> And, I may found the root cause in my case:
> {code:title=PCollectionImpl.java|borderStyle=solid}
> public void materializeAt(SourceTarget sourceTarget) {
>   this.materializedAt = sourceTarget;
>   this.size = materializedAt.getSize(getPipeline().getConfiguration());
> }
> @Override
> public long getSize() {
> if (size < 0) {
> this.size = getSizeInternal();
> }
> return size;
> }
> {code}
> PColletionImpl.materializeAt(sourceTarget) this method will be invoked when 
> node splits to create temporary table, source sourceTarget binds with the new 
> temporary table whose size is 0, since its path was just created, the 
> this.size will be 0. After that, when getSize() was invoked by setting reduce 
> number, since the size is 0, it will just return 0, which makes reduce number 
> too small.
> So i think the code of materializeAt() should check sourceTarget's size, like 
> below:
> {code:title=PCollectionImpl.java|borderStyle=solid}
> public void materializeAt(SourceTarget sourceTarget) {
>   this.materializedAt = sourceTarget;
>   long size = materializedAt.getSize(getPipeline().getConfiguration());
>   if (size > 0)
>   this.size = size;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CRUNCH-624) temporary table size is 0, which makes reducer number too small

2016-10-18 Thread JingChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CRUNCH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JingChen updated CRUNCH-624:

Description: 
if the pipeline produce temporary table , the reduce number of the temporary 
table whose input table is temporary table may become very small in some cases, 
since temporary table has no content .

And, I may found the root cause in my case:
{code:title=Bar.java|borderStyle=solid}
public void materializeAt(SourceTarget sourceTarget) {
  this.materializedAt = sourceTarget;
  this.size = materializedAt.getSize(getPipeline().getConfiguration());
}

@Override
public long getSize() {
if (size < 0) {
this.size = getSizeInternal();
}
return size;
}
{code}
PColletionImpl.materializeAt(sourceTarget) this method will be invoked when 
node splits to create temporary table, source sourceTarget binds with the new 
temporary table whose size is 0, since its path was just created, the this.size 
will be 0. After that, when getSize() was invoked by setting reduce number, 
since the size is 0, it will just return 0, which makes reduce number too small.

So i think the code of materializeAt() should check sourceTarget's size, like 
below:
{code:title=Bar.java|borderStyle=solid}
public void materializeAt(SourceTarget sourceTarget) {
  this.materializedAt = sourceTarget;
  long size = materializedAt.getSize(getPipeline().getConfiguration());
  if (size > 0)
  this.size = size;
}
{code}

  was:
if the pipeline produce temporary table , the reduce number of the temporary 
table whose input table is temporary table may become very small in some cases, 
since temporary table has no content .

And, I may found the root cause in my case:

public void materializeAt(SourceTarget sourceTarget) {
  this.materializedAt = sourceTarget;
  this.size = materializedAt.getSize(getPipeline().getConfiguration());
}

@Override
public long getSize() {
if (size < 0) {
this.size = getSizeInternal();
}
return size;
}

PColletionImpl.materializeAt(sourceTarget) this method will be invoked when 
node splits to create temporary table, source sourceTarget binds with the new 
temporary table whose size is 0, since its path was just created, the this.size 
will be 0. After that, when getSize() was invoked by setting reduce number, 
since the size is 0, it will just return 0, which makes reduce number too small.
So i think the code of materializeAt() should check sourceTarget's size, like 
below:
public void materializeAt(SourceTarget sourceTarget) {
  this.materializedAt = sourceTarget;
  long size = materializedAt.getSize(getPipeline().getConfiguration());
  if (size > 0)
  this.size = size;
}


> temporary table size is 0, which makes reducer number too small
> ---
>
> Key: CRUNCH-624
> URL: https://issues.apache.org/jira/browse/CRUNCH-624
> Project: Crunch
>  Issue Type: Bug
>  Components: Core
>Reporter: JingChen
>Assignee: Josh Wills
>
> if the pipeline produce temporary table , the reduce number of the temporary 
> table whose input table is temporary table may become very small in some 
> cases, since temporary table has no content .
> And, I may found the root cause in my case:
> {code:title=Bar.java|borderStyle=solid}
> public void materializeAt(SourceTarget sourceTarget) {
>   this.materializedAt = sourceTarget;
>   this.size = materializedAt.getSize(getPipeline().getConfiguration());
> }
> @Override
> public long getSize() {
> if (size < 0) {
> this.size = getSizeInternal();
> }
> return size;
> }
> {code}
> PColletionImpl.materializeAt(sourceTarget) this method will be invoked when 
> node splits to create temporary table, source sourceTarget binds with the new 
> temporary table whose size is 0, since its path was just created, the 
> this.size will be 0. After that, when getSize() was invoked by setting reduce 
> number, since the size is 0, it will just return 0, which makes reduce number 
> too small.
> So i think the code of materializeAt() should check sourceTarget's size, like 
> below:
> {code:title=Bar.java|borderStyle=solid}
> public void materializeAt(SourceTarget sourceTarget) {
>   this.materializedAt = sourceTarget;
>   long size = materializedAt.getSize(getPipeline().getConfiguration());
>   if (size > 0)
>   this.size = size;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)